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Purpose 



This guide is designed to give you information about programming in a 
UNIX system environment. It does not attempt to teach you how to write 
programs. Rather, it is intended to supplement texts on programming 
languages by concentrating on the other elements that are part of getting 
programs into operation. 



Audience and Prerequisite Knowledge 

As the title suggests, we are addressing programmers, especially those 
who have not worked extensively with the UNIX system. No special level 
of programming involvement is assumed. We hope the book will be useful 
to people who write only an occasional program as well as those who work 
on or manage large application development projects. 

Programmers in the expert class, or those engaged in developing system 
software, may find this guide lacks the depth of information they need. For 
them we recommend the Programmer's Reference Manual, 

Knowledge of terminal use, of a UNIX system editor, and of the UNIX 
system directory /file structure is assumed. If you feel shaky about your 
mastery of these basic tools, you might want to look over the User's Guide 
before tackling this one. 



Organization 

The material is organized into two parts and seventeen chapters, as fol- 
lows: 

■ Part 1, Chapter 1 — Overview 

Identifies the special features of the UNIX system that make up the 
programming environment: the concept of building blocks, pipes, 
special files, shell programming, etc. As a framework for the material 
that follows, three different levels of programming in a UNIX system 
are defined: single-user, applications, and systems programming. 

■ Chapter 2 — Programming Basics 

Describes the most fundamental utilities needed to get programs run- 
ning. 
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■ Chapter 3 — Application Programming 

Enlarges on many of the topics covered in the previous chapter with 
particular emphasis on how things change as the project grows 
bigger. Describes tools for keeping programming projects organized. 

■ Part 2, Chapters 4 through 17 — Support Tools, Descriptions, and 
Tutorials 

Includes detailed information about the use of many of the UNIX sys- 
tem tools. 

At the end of the text is an appendix on how to perform floating point 
operations, a glossary, and an index. 



The C Connection 

The UNIX system supports many programming languages, and C com- 
pilers are available on many different operating systems. Nevertheless, the 
relationship between the UNIX operating system and C has always been 
and remains very close. Most of the code in the UNIX operating system is 
C, and over the years many organizations using the UNIX system have 
come to use C for an increasing portion of their application code. Thus, 
while this guide is intended to be useful to you no matter what language(s) 
you are using, you will find that, unless there is a specific language- 
dependent point to be made, the examples assume you are programming in 
C. 



Hardware/Software Dependencies 

Nearly all the text in this book is accurate for any AT&T computer run- 
ning UNIX System V Release 3.0 or a later release, with the exception of 
hardware-specific information such as addresses. The hardware-specific 
information in this text reflects the way things work on an AT&T 3B2 Com- 
puter running UNIX System V at the Release 3.0 level. 

If you find commands that work a little differently in your UNIX system 
environment, it may be because you are running under a dififerent release of 
the software. If some commands just don't seem to exist at all, they may be 
members of packages not installed on your system. If you do find yourself 
trying to execute a non-existent command, talk to the administrators of your 
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system to find out what you have available. 

In Part 2 of this text, most of the chapters describe an individual tool or 
feature. A few of these tools are part of separate products with different 
releases and updates. For example^ it is possible that you could be running 
the latest issue of the C Programming Language Utilities on an earlier 
release of the UNIX operating system. There is nothing wrong with this 
configuration, but the features described in this text may not work with the 
release of the tool you have. 

Listed below are the chapters that describe tools with features that work 
in specific releases of UNIX System V. Also listed are parts of this text that 
have been recently enhanced. 

Chapter 4 This chapter describes the version of awk released in UNIX 
System V Release 3,1 and described on the nawk(l) manual 
page. If you use the nawk(l) command, this chapter 
describes the version of awk you are using. 

If you are running a release of UNIX System V earlier than 
Release 3.1, you should keep Chapter 4 from your previous 
Programmer's Guide. This older version of awk is described 
on the awk(l) manual page. 

Chapter 8 This chapter describes the shared libraries and related com- 
mands associated with UNIX System V Release 3.1. If you 
are running UNIX System V Release 3.0, you should keep 
Chapter 8 from your previous Programmer's Guide. Shared 
libraries are not supported on any release earlier than UNIX 
System V Release 3.0. 

Chapter 10 This chapter describes the curses /terminfo routines. The 

text of this chapter has been enhanced and is more accurate 
and more easily understood. 

Appendix A This appendix describes to the sophisticated programmer 

how to perform floating point operations on the UNIX sys- 
tem. The operations described are supported by Issue 4.2 
and later of the C Programming Language Utilities. 

Index The index to this text has been expanded and improved. 
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Notation Conventions 

Whenever the text includes examples of output from the computer 
and /or commands entered by you, we follow the standard notation scheme 
that is common throughout UNIX system documentation: 

■ Commands that you type in from your terminal are shown in bold 
type. 

■ Text that is printed on your terminal by the computer is shown in 
oosnstant width type. Constant width type is also used for code sam- 
ples because it allows the most accurate representation of spacing. 
Spacing is often a matter of coding style, but is sometimes critical. 

■ Comments added to a display to show that part of the display has 
been omitted are shown in italic type and are indented to separate 
them from the text that represents computer output or input. Com- 
ments that explain the input or output are shown in the same type 
font as the rest of the display. 



Italics are also used to show substitutable values, such as, filename, 
when the format of a command is shown. 

■ There is an implied RETURN at the end of each command and menu 
response you enter. Where you may be expected to enter only a 
RETURN (as when you accept a menu default), the symbol <CR> is 
used. 

■ In cases where you are expected to enter a control character, it is 
shown as, for example, CTRL-D. This means that you press the d 
key on your keyboard while holding down the CTRL key. 

■ The dollar sign, $, and pound sign, #, symbols are the standard 
default prompt signs for an ordinary user and root respectively. 

■ When the # prompt is used in an example, it means the command 
illustrated may be used only by root. 
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Command References 

When commands are mentioned in a section of the text for the first 
time, a reference to the manual section where the command is formally 
described is included in parentheses: command(section). The numbered 
sections are located in the following manuals: 

Sections (1, IC, IG) User's Reference Manual 

Sections (1, IM), (4), (5), (7), (8) System Administrator's Reference Manual 

Sections (1), (2), (3), (4), (5) Programmer's Reference Manual 

Note that Section 1 is listed for all the manuals. Section 1 of the User's 
Reference Manual describes commands appropriate for general users. Section 
1 of the Programmer's Reference Manual describes commands appropriate for 
programmers. Section 1 of the System Administrator's Reference Manual 
describes commands appropriate for system administrators. 

Information in tlie Examples 

While every effort has been made to present displays of information just 
as they appear on your terminal, it is possible that your system may produce 
slightly different output. Some displays depend on a particular machine 
configuration that may differ from yours. Changes between releases of the 
UNIX system software may cause small differences in what appears on your 
terminal. 

Where complete code samples are shown, we have tried to make sure 
they compile and work as represented. Where code fragments are shown, 
while we can't say that they have been compiled, we have attempted to 
maintain the same standards of coding accuracy for them. 
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The 1983 Turing Award of the Association for Computing Machinery 
was given jointly to Ken Thompson and Dennis Ritchie, the two men who 
first designed and developed the UNIX operating system. The award cita- 
tion said, in part: 

"The success of the UNIX system stems from its 
tasteful selection of a few key ideas and their 
elegant implementation. The model of the UNIX 
system has led a generation of software designers 
to new ways of thinking about programming. The 
genius of the UNIX system is its framework which 
enables programmers to stand on the work of others." 

As programmers working in a UNIX system environment, why should 
we care what Thompson and Ritchie did? Does it have any relevance for us 
today? 

It does because if we understand the thinking behind the system design 
and the atmosphere in v/hich it flowered, it can help us become productive 
UNIX system programmers more quickly. 

The Early Days 

You may already have read about how Ken Thompson came across a 
DEC PDP-7 machine sitting unused in a hallway at AT&T Bell Laboratories, 
and how he and Dennis Ritchie and a few of their colleagues used that as 
the original machine for developing a new operating system that became 
UNIX. 

The important thing to realize, however, is that what they were trying 
to do was fashion a pleasant computing environment for themselves. It was 
not, "Let's get together and build an operating system that will attract 
world-wide attention." 

The sequence in which elements of the system fell into place is interest- 
ing. The first piece was the file system, followed quickly by its organization 
into a hierarchy of directories and files. The view of everything, data 
stores, programs, commands, directories, even devices, as files of one type or 
another was critical, as was the idea of a file as a one-dimensional array of 
bytes with no other structure implied. The cleanness and simplicity of this 
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way of looking at files has been a major contributing factor to a computer 
environment that programmers and other users have found comfortable to 
work in. 

The next element was the idea of processes, with one process being able 
to create another and communicate with it. This innovative way of looking 
at running programs as processes led easily to the practice (quintessentially 
UNIX) of reusing code by calling it from another process. With the addi- 
tion of commands to manipulate files and an assembler to produce execut- 
able programs, the system was essentially able to function on its own. 

The next major development was the acquisition of a DEC PDP-11 and 
the installation of the new system on it. This has been described by Ritchie 
as a stroke of good luck, in that the PDP-11 was to become a hugely success- 
ful machine, its success to some extent adding momentum to the acceptance 
of the system that began to be known by the name of UNIX. 

By 1972 the innovative idea of pipes (connecting links between 
processes whereby the output of one becomes the input of the next) had 
been incorporated into the system, the operating system had been recoded 
in higher level languages (first B, then C), and had been dubbed with the 
name UNIX (coined by Brian Kernighan). By this point, the "pleasant com- 
puting environment" sought by Thompson and Ritchie was a reality; but 
some other things were going on that had a strong influence on the charac- 
ter of the product then and today. 

It is worth pointing out that the UNIX system came out of an atmo- 
sphere that was totally different from that in which most commercially suc- 
cessful operating systems are produced. The more typical atmosphere is 
that described by Tracy Kidder in The Soul of a New Machine. In that case, 
dozens of talented programmers worked at white heat, in an atmosphere of 
extremely tight security, against murderous deadlines. By contrast, the 
UNIX system could be said to have had about a ten year gestation period. 
From the beginning it attracted the interest of a growing number of brilli- 
ant specialists, many of whom found in the UNIX system an environment 
that allowed them to pursue research and development interests of their 
own, but who in turn contributed additions to the body of tools available 
for succeeding ranks of UNIX programmers. 

Beginning in 1971, the system began to be used for applications within 
AT&T Bell Laboratories, and shortly thereafter (1974) was made available at 
low cost and without support to colleges and universities. These versions, 
called research versions and identified with Arabic numbers up through 7, 
occasionally grew on their own and fed back to the main system additional 
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innovative tools. The widely-used screen editor vi(l), for example, was 
added to the UNIX system by William Joy at the University of California, 
Berkeley. In 1979 acceding to commercial demand, AT&T began offering 
supported versions (called development versions) of the UNIX system. 
These are identified with Roman numerals and often have interim release 
numbers appended. The current development version, for example, is 
UNIX System V Release 3.0. 

Versions of the UNIX system being offered now by AT&T are coming 
from an environment more closely related, perhaps, to the standard 
software factory. Features are being added to new releases in response to 
the expressed needs of the market place. The essential quality of the UNIX 
system, however, remains as the product of the innovative thinking of its 
originators and the collegial atmosphere in which they worked. This qual- 
ity has on occasion been referred to as the UNIX philosophy, but what is 
meant is the way in which sophisticated programmers have come to work 
with the UNIX system. 



UNIX System Philosophy Simply Stated 

For as long as you are writing programs on a UNIX system you should 
keep this motto hanging on your wall: 

* Build on t±e vjork of others * 

* ♦ 

Unlike computer environments where each new project is like starting 
with a blank canvas, on a UNIX system a good percentage of any program- 
ming effort is lying there in bins, and Ibins, and /usr/bins, not to mention 
etc, waiting to be used. 

The features of the UNIX system (pipes, processes, and the file system) 
contribute to this reusability, as does the history of sharing and contributing 
that extends back to 1969. You risk missing the essential nature of the 
UNIX system if you don't put this to work. 
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UNIX System Tools and Where You Can Read 
About Them 

The term "UNIX system tools" can stand some clarification. In the nar- 
rowest sense, it means an existing piece of software used as a component in 
a new task. In a broader context, the term is often used to refer to elements 
of the UNIX system that might also be called features, utilities, programs, 
filters, commands, languages, functions, and so on. It gets confusing 
because any of the things that might be called by one or more of these 
names can be, and often are, used in the narrow way as part of the solution 
to a programming problem. 



Tools Covered and Not Covered in this Guide 

The Programmer's Guide is about tools used in the process of creating pro- 
grams in a UNIX system environment, so let's take a minute to talk about 
which tools we mean, which ones are not going to be covered in this book, 
and where you might find information about those not covered here. Actu- 
ally, the subject of things not covered in this guide might be even more 
important to you than the things that are. We couldn't possibly cover 
everything you ever need to know about UNIX system tools in this one 
volume. 

Tools not covered in this text: 

■ the login procedure 

■ UNIX system editors and how to use them 

■ how the file system is organized and how you move around in it 

■ shell programming 

Information about these subjects can be found in the User's Guide and a 
number of commercially available texts. 

Tools covered here can be classified as follows: 

■ utilities for getting programs running 

■ utilities for organizing software development projects 
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■ specialized languages 

■ debugging and analysis tools 

■ compiled language components that are not part of the language syn- 
tax, for example, standard libraries, systems calls, and functions 



The Shell as a Prototyping Tool 

Any time you log in to a UNIX system machine you are using the shell. 
The shell is the interactive command interpreter that stands between you 
and the UNIX system kernel, but that's only part of the story. Because of its 
ability to start processes, direct the flow of control, field interrupts and 
redirect input and output it is a full-fledged programming language. Pro- 
grams that use these capabilities are known as shell procedures or shell 
scripts. 

Much innovative use of the shell involves stringing together commands 
to be run under the control of a shell script. The dozens and dozens of 
commands that can be used in this way are documented in the User's Refer- 
ence Manual. Time spent with the User's Reference Manual can be rewarding. 
Look through it when you are trying to find a command with just the right 
option to handle a knotty programming problem. The more familiar you 
become with the commands described in the manual pages the more you 
will be able to take full advantage of the UNIX system environment. 

It is not our purpose here to instruct you in shell programming. What 
we want to stress here is the important part that shell procedures can play 
in developing prototypes of full-scale applications. While understanding all 
the nuances of shell programming can be a fairly complex task, getting a 
shell procedure up and running is far less time-consuming than writing, 
compiling and debugging compiled code. 

This ability to get a program into production quickly is what makes the 
shell a valuable tool for program development. Shell programming allows 
you to "build on the work of others" to the greatest possible degree, since it 
allows you to piece together major components simply and efficiently. 
Many times even large applications can be done using shell procedures. 
Even if the application is initially developed as a prototype system for test- 
ing purposes rather than being put into production, many months of work 
can be saved. 
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UNIX System Tools — 

With a prototype for testing, the range of possible user errors can be 
determined— something that is not always easy to plan out when an appli- 
cation is being designed. The method of dealing with strange user input 
can be worked out inexpensively, avoiding large re-coding problems. 

A common occurrence in the UNIX system environment is to find that 
an available UNIX system tool can accomplish with a couple of lines of 
instructions what might take a page and a half of compiled code. Shell pro- 
cedures can intermix compiled modules and regular UNIX system com- 
mands to let you take advantage of work that has gone before. 
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We distinguish among three programming environments to emphasize 
that the information needs and the way in which UNIX system tools are 
used differ from one environment to another. We do not intend to imply a 
hierarchy of skill or experience. Highly-skilled programmers with years of 
experience can be found in the "single-user" category, and relative newco- 
mers can be members of an application development or systems program- 
ming team. 



Single-User Programmer 

Programmers in this environment are writing programs only to ease the 
performance of their primary job. The resulting programs might well be 
added to the stock of programs available to the community in which the 
programmer works. This is similar to the atmosphere in which the UNIX 
system thrived; someone develops a useful tool and shares it with the rest 
of the organization. Single-user programmers may not have externally 
imposed requirements, or co-authors, or project management concerns. The 
programming task itself drives the coding very directly. One advantage of 
a timesharing system such as UNIX is that people with programming skills 
can be set free to work on their own without having to go through formal 
project approval channels and perhaps wait for months for a programming 
department to solve their problems. 

Single-user programmers need to know how to: 

■ select an appropriate language 

■ compile and run programs 

■ use system libraries 

■ analyze programs 

■ debug programs 

■ keep track of program versions 

Most of the information to perform these functions at the single-user 
level can be found in Chapter 2. 
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Application Programming 

Programmers working in this environment are developing systems for 
the benefit of other, non-programming users. Most large commercial com- 
puter applications still involve a team of applications development program- 
mers. They may be employees of the end-user organization or they may 
work for a software development firm. Some of the people working in this 
environment may be more in the project management area than working 
programmers. 

Information needs of people in this environment include all the topics 
in Chapter 2, plus additional information on: 

■ software control systems 

■ file and record locking 

■ communication between processes 

■ shared memory 

■ advanced debugging techniques 

These topics are discussed in Chapter 3. 

Systems Programmers 

These are programmers engaged in writing software tools that are part 
of, or closely related to the operating system itself. The project may involve 
writing a new device driver, a data base management system or an enhance- 
ment to the UNIX system kernel. In addition to knowing their way around 
the operating system source code and how to make changes and enhance- 
ments to it, they need to be thoroughly familiar with all the topics covered 
in Chapters 2 and 3. 
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Summary 

In this overview chapter we have described the way that the UNIX sys- 
tem developed and the effect that has on the way programmers now work 
with it. We have described what is and is not to be found in the other 
chapters of this guide to help programmers. We have also suggested that in 
many cases programming problems may be easily solved by taking advan- 
tage of the UNIX system interactive command interpreter known as the 
shell. Finally, we identified three programming environments in the hope 
that it will help orient the reader to the organization of the text in the 
remaining chapters. 
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Introduction 



The information in this chapter is for anyone just learning to write pro- 
grams to run in a UNIX system environment. In Chapter 1 we identified 
one group of UNIX system users as single-user programmers. People in 
that category, particularly those who are not deeply interested in program- 
ming, may find this chapter (plus related reference manuals) tells them as 
much as they need to know about coding and running programs on a UNIX 
system computer. 

Programmers whose interest does run deeper, who are part of an appli- 
cation development project, or who are producing programs on one UNIX 
system computer that are being ported to another, should view this chapter 
as a starter package. 
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How do you decide which programming language to use in a given 
situation? One answer could be, "I always code in HAIRBOL, because that's 
the language I know best." Actually, in some circumstances that's a legiti- 
mate answer. But assuming more than one programming language is avail- 
able to you, that different programming languages have their strengths and 
weaknesses, and assuming that once you've learned to use one program- 
ming language it becomes relatively easy to learn to use another, you might 
approach the problem of language selection by asking yourself questions 
like the following: 

■ What is the nature of the task this program is to do? 

Does the task call for the development of a complex algorithm, or is 
this a simple procedure that has to be done on a lot of records? 

■ Does the programming task have many separate parts? 

Can the program be subdivided into separately compilable functions, 
or is it one module? 

■ How soon does the program have to be available? 

Is it needed right now, or do I have enough time to work out the 
most efficient process possible? 

■ What is the scope of its use? 

Am I the only person who will use this program, or is it going to be 
distributed to the whole world? 

■ Is there a possibility the program will be ported to other systems? 

■ What is the life-expectancy of the program? 

Is it going to be used just a few times, or will it still be going strong 
five years from now? 



Supported Languages in a UNIX System Environment 

By "languages" we mean those offered by AT&T for use on an AT&T 3B 
Computer running a current release of UNIX System V. Since these are 
separately purchasable items, not all of them will necessarily be installed on 
your machine. On the other hand, you may have languages available on 
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your machine that came from another source and are not mentioned in this 
discussion. Be that as it may, in this section and the one to follow we give 
brief descriptions of the nature of a) six full-scale programming languages, 
and b) a number of special purpose languages. 

C Language 

C is intimately associated with the UNIX system since it was originally 
developed for use in recoding the UNIX system kernel. If you need to use 
a lot of UNIX system function calls for low-level I/O, memory or device 
management, or inter-process communication, C language is a logical first 
choice. Most programs, however, don't require such direct interfaces with 
the operating system so the decision to choose C might better be based on 
one or more of the following characteristics: 

■ a variety of data types: character, integer, long integer, float, and 
double 

■ low level constructs (most of the UNIX system kernel is written in C) 

■ derived data types such as arrays, functions, pointers, structures and 
unions 

■ multi-dimensional arrays 

■ scaled pointers, and the ability to do pointer arithmetic 

■ bit-wise operators 

■ a variety of flow-of-control statements: if, if-else, switch, while, do- 
while, and for 

■ a high degree of portability 

C is a language that lends itself readily to structured programming. It is 
natural in C to think in terms of functions. The next logical step is to view 
each function as a separately compilable unit. This approach (coding a pro- 
gram in small pieces) eases the job of making changes and /or improve- 
ments. If this begins to sound like the UNIX system philosophy of building 
new programs from existing tools, it's not just coincidence. As you create 
functions for one program you will surely find that many can be picked up, 
or quickly revised, for another program. 
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A difficulty with C is that it takes a fairly concentrated use of the 
language over a period of several months to reach your full potential as a C 
programmer. If you are a casual programmer, you might make life easier 
for yourself if you choose a less demanding language. 

FORTRAN 

The oldest of the high-level programming languages, FORTRAN is still 
highly prized for its variety of mathematical functions. If you are writing a 
program for statistical analysis or other scientific applications, FORTRAN is 
a good choice. An original design objective was to produce a language with 
good operating efficiency. This has been achieved at the expense of some 
flexibility in the area of type definition and data abstraction. There is, for 
example, only a single form of the iteration statement. FORTRAN also 
requires using a somewhat rigid format for input of lines of source code. 
This shortcoming may be overcome by using one of the UNIX system tools 
designed to make FORTRAN more flexible. 

Pascal 

Originally designed as a teaching tool for block structured program- 
ming, Pascal has gained quite a wide acceptance because of its straightfor- 
ward style. Pascal is highly structured and allows system level calls (charac- 
teristics it shares with C). Since the intent of the developers, however, was 
to produce a language to teach people about programming it is perhaps best 
suited to small projects. Among its inconveniences are its lack of facilities 
for specifying initial values for variables and limited file processing capabil- 
ity. 

COBOL 

Probably more programmers are familiar with COBOL than with any 
other single programming language. It is frequently used in business appli- 
cations because its strengths lie in the management of input /output and in 
defining record layouts. 

It is somewhat cumbersome to use COBOL for complex algorithms, but 
it works well in cases where many records have to be passed through a sim- 
ple process; a payroll withholding tax calculation, for example. It is a rather 
tedious language to work with because each program requires a lengthy 
amount of text merely to describe record layouts, processing environment 
and variables used in the code. The COBOL language is wordy so the com- 
pilation process is often quite complex. Once written and put into produc- 
tion, COBOL programs have a way of staying in use for years, and what 
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might be thought of by some as wordiness comes to be considered self- 
documentation. The investment in programmer time often makes them 
resistant to change. 

BASIC 

The most commonly heard comment about BASIC is that it is easy to 
learn. With the spread of personal microcomputers many people have 
learned BASIC because it is simple to produce runnable programs in very 
little time. It is difficult, however, to use BASIC for large programming pro- 
jects. It lacks the provision for structured flow-of-control, requires that 
every variable used be defined for the entire program and has no way of 
transferring values between functions and calling programs. Most versions 
of BASIC run as interpreted code rather than compiled. That makes for 
slower running programs. Despite its limitations, however, it is useful for 
getting simple procedures into operation quickly. 

Assembly Language 

The closest approach to machine language, assembly language is specific 
to the particular computer on which your program is to run. High-level 
languages are translated into the assembly language for a specific processor 
as one step of the compilation. The most common need to work in assem- 
bly language arises when you want to do some task that is not within the 
scope of a high-level language. Since assembly language is machine- 
specific, programs written in it are not portable. 



Special Purpose Languages 

In addition to the above formal programming languages, the UNIX sys- 
tem environment frequently offers one or more of the special purpose 
languages listed below. 



NOTE 



Since UNIX system utilities and commands are packaged in functional 
groupings, it is possible that not all the facilities mentioned will be 
available on all systems. 
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awk 

awk (its name is an acronym constructed from the initials of its 
developers) scans an input file for lines that match pattern(s) described in a 
specification file. On finding a line that matches a pattern, awk performs 
actions also described in the specification. It is not uncommon that an awk 
program can be written in a couple of lines to do functions that would take 
a couple of pages to describe in a programming language like FORTRAN or 
C. For example, consider a case where you have a set of records that consist 
of a key field and a second field that represents a quantity. You have sorted 
the records by the key field, and you now want to add the quantities for 
records with duplicate keys and output a file in which no keys are dupli- 
cated. The pseudo-code for such a program might look like this: 



Read the first record into a hold area; 
Read additicmal records rnitil EOF; 
{ 

If the key matches the key of the record in the hold area» 
add the quantity to the quantity field of the held record; 

If the key does not natch the key of the held record, 
vnrite the held record, 
nove the new record to the hold area; 

} 

At EOF, write cut the last record from the hold area. 



An awk program to accomplish this task would look like this: 

{ c[ty[$1] += $2 } 
END { far (key in qty) print key, qty[key] } 

This illustrates only one characteristic of awk; its ability to work with asso- 
ciative arrays. With awk, the input file does not have to be sorted, which is 
a requirement of the pseudo-program. 
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lex 

lex is a lexical analyzer that can be added to C or FORTRAN programs. 
A lexical analyzer is interested in the vocabulary of a language rather than 
its grammar, which is a system of rules defining the structure of a language, 
lex can produce C language subroutines that recognize regular expressions 
specified by the user, take some action when a regular expression is recog- 
nized and pass the output stream on to the next program. 

yacc 

yacc (Yet Another Compiler Compiler) is a tool for describing an input 
language to a computer program, yacc produces a C language subroutine 
that parses an input stream according to rules laid down in a specification 
file. The yacc specification file establishes a set of grammar rules together 
with actions to be taken when tokens in the input match the rules, lex may 
be used with yacc to control the input process and pass tokens to the parser 
that applies the grammar rules. 

M4 

M4 is a macro processor that can be used as a preprocessor for assembly 
language, and C programs. It is described in Section (1) of the Programmer's 
Reference Manual. 

be and dc 

be enables you to use a computer terminal as you would a programm- 
able calculator. You can edit a file of mathematical computations and call be 
to execute them. The be program uses de. You can use de directly, if you 
want, but it takes a little getting used to since it works with reverse Polish 
notation. That means you enter numbers into a stack followed by the 
operator, be and de are described in Section (1) of the User's Reference 
Manual. 

curses 

Actually a library of C functions, curses is included in this list because 
the set of functions just about amounts to a sub-language for dealing with 
terminal screens. If you are writing programs that include interactive user 
screens, you will want to become familiar with this group of functions. 

In addition to all the foregoing, don't overlook the possibility of using 
shell procedures. 
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The last two steps in most compilation systems in the UNIX system 
environment are the assembler and the link editor. The compilation system 
produces assembly language code. The assembler translates that code into 
the machine language of the computer the program is to run on. The link 
editor resolves all undefined references and makes the object module exe- 
cutable. With most languages on the UNIX system the assembler and link 
editor produce files in what is known as the Common Object File Format 
(COFF). A common format makes it easier for utilities that depend on 
information in the object file to work on different machines running 
different versions of the UNIX system. 

In the Common Object File Format an object file contains: 

■ a file header 

■ optional secondary header 

■ a table of section headers 

■ data corresponding to the section header(s) 

■ relocation information 

■ line numbers 

■ a symbol table 

■ a string table 

An object file is made up of sections. Usually, there are at least two: 
.text^ and .data. Some object files contain a section called .bss. (.bss is an 
assembly language pseudo-op that originally stood for "block started by 
symbol.") .bss, when present, holds uninitialized data. Options of the com- 
pilers cause different items of information to be included in the Common 
Object File Format. For example, compiling a program with the -g option 
adds line numbers and other symbolic information that is needed for the 
sdb (Symbolic Debugger) command to be fully effective. You can spend 
many years programming without having to worry too much about the con- 
tents and organization of the Common Object File Format, so we are not 
going into any further depth of detail at this point. Detailed information is 
available in Chapter 11 of this guide. 
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Compiling and Linic Editing 

The command used for compiling depends on the language used; 

■ for C programs, cc both compiles and link edits 

■ for FORTRAN programs, f 77 both compiles and link edits 

Compiling C Programs 

To use the C compilation system you must have your source code in a 
file with a filename that ends in the characters .c, as in mycode.c. The com- 
mand to invoke the compiler is: 

cc mycode.c 

If the compilation is successful the process proceeds through the link edit 
stage and the result will be an executable file by the name of a.out. 

Several options to the cc command are available to control its operation. 
The most used options are: 

— c causes the compilation system to suppress the link edit 

phase. This produces an object file (mycode.o) that can be 
link edited at a later time with a cc command without the 
— c option. 

— g causes the compilation system to generate special informa- 

tion about variables and language statements used by the 
symbolic debugger sdb. If you are going through the stage 
of debugging your program, use this option. 

— O causes the inclusion of an additional optimization phase. 

This option is logically incompatible with the — g option. 
You would normally use — O after the program has been 
debugged, to reduce the size of the object file and increase 
execution speed. 

— p causes the compilation system to produce code that works in 

conjunction with the prof(l) command to produce a runtime 
profile of where the program is spending its time. Useful in 
identifying which routines are candidates for improved 
code. 
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— o outfile tells cc to tell the link editor to use the specified name for 
the executable file, rather than the default a.out. 

Other options can be used with cc. Check the Programmer's Reference 
Manual, 

If you enter the cc command using a file name that ends in .s, the com- 
pilation system treats it as assembly language source code and bypasses all 
the steps ahead of the assembly step. 

Compiling FORTRAN Programs 

The f77 command invokes the FORTRAN compilation system. The 
operation of the command is similar to that of the cc command, except the 
source code file(s) must have a .f suffix. The f77 command compiles your 
source code and calls in the link editor to produce an executable file whose 
name is a.out. 

The following command line options have the same meaning as they do 
for the cc command: 

— c, — p, — O, — g, and — o outfile 

Loading and Running BASIC Programs 

BASIC programs can be invoked in two ways: 

■ With the command 
basic bscpgm.b 

where bscpgm.b is the name of the file that holds your BASIC state- 
ments. This tells the UNIX system to load and run the program. If 
the program includes a run statement naming another program, you 
will chain from one to the other. Variables specified in the first can 
be preserved for the second with the common statement. 

■ By setting up a shell script. 

Compiler Diagnostic Messages 

The C compiler generates error messages for statements that don't com- 
pile. The messages are generally quite understandable, but in common with 
most language compilers they sometimes point several statements beyond 
where the actual error occurred. For example, if you inadvertently put an 
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extra ; at the end of an if statement, a subsequent else will be flagged as a 
syntax error. In the case where a block of several statements follows the if, 
the line number of the syntax error caused by the else will start you looking 
for the error well past where it is. Unbalanced curly braces, { ), are another 
common producer of syntax errors. 

Link Editing 

The Id command invokes the link editor directly. The typical user, 
however, seldom invokes Id directly. A more common practice is to use a 
language compilation control command (such as cc) that invokes Id. The 
link editor combines several object files into one, performs relocation, 
resolves external symbols, incorporates startup routines, and supports sym- 
bol table information used by sdb. You may, of course, start with a single 
object file rather than several. The resulting executable module is left in a 
file named a.out. 

Any file named on the Id command line that is not an object file (typi- 
cally, a name ending in o) is assumed to be an archive library or a file of 
link editor directives. The Id command has some 16 options. We are going 
to describe four of them. These options should be fed to the link editor by 
specifying them on the cc command line if you are doing both jobs with the 
single command, which is the usual case. 

— o out file provides a name to be used to replace a*out as the name 
of the output file. Obviously, the name a.out is of only 
temporary usefulness. If you know the name you want 
to use to invoke your program, you can provide it here. 
Of course, it may be equally convenient to do this: 

mv a*out progname 

when you want to give your program a less temporary 
name. 

—Ix directs the link editor to search a library libx.a, where x 

is up to nine characters. For C programs, libc.a is 
automatically searched if the cc command is used. The 
—\x option is used to bring in libraries not normally in 
the search path such as libm.a, the math library. The — Ix 
option can occur more than once on a command line, 
with different values for the x. A library is searched 
when its name is encountered, so the placement of the 
option on the command line is important. The safest 
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place to put it is at the end of the command line. The 
— l:c option is related to the — L option. 

-L dir changes the libx^a search sequence to search in the 

specified directory before looking in the default library 
directories, usually /lib or /usr/lib. This is useful if you 
have different versions of a library and you want to point 
the link editor to the correct one. It works on the 
assumption that once a library has been found no further 
searching for that library is necessary. Because — L 
diverts the search for the libraries specified by -Ix 
options, it must precede such options on the command 
line. 

-u symname enters symname as an undefined symbol in the symbol 
table. This is useful if you are loading entirely from an 
archive library, because initially the symbol table is 
empty and needs an unresolved reference to force the 
loading of the first routine. 

When the link editor is called through cc, a startup routine (typically 
/lib/crtO.o for C programs) is linked with your program. This routine calls 
exit(2) after execution of the main program. 

The link editor accepts a file containing link editor directives. The 
details of the link editor command language can be found in Chapter 12. 
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Language and the UNIX System 

When a program is run in a computer it depends on the operating sys- 
tem for a variety of services. Some of the services such as bringing the pro- 
gram into main memory and starting the execution are completely tran- 
sparent to the program. They are, in effect, arranged for in advance by the 
link editor when it marks an object module as executable. As a programmer 
you seldom need to be concerned about such matters. 

Other services, however, such as input /output, file management, storage 
allocation do require work on the part of the programmer. These connec- 
tions between a program and the UNIX operating system are what is meant 
by the term UNIX system /language interface. The topics included in this 
section are: 

■ How arguments are passed to a program 

■ System calls and subroutines 

■ Header files and libraries 

■ Input /Output 

■ Processes 

■ Error Handling, Signals, and Interrupts 



Why C Is Used to Illustrate the Interface 

Throughout this section C programs are used to illustrate the interface 
between the UNIX system and programming languages because C programs 
make more use of the interface mechanisms than other high-level 
languages. What is really being covered in this section then is the UNIX 
system /C Language interface. The way that other languages deal with these 
topics is described in the user's guides for those languages. 
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How Arguments Are Passed to a Program 

Information or control data can be passed to a C program as arguments 
on the command line. When the program is run as a command, arguments 
on the command line are made available to the function main in two 
parameters, an argument count and an array of pointers to character strings. 
(Every C program is required to have an entry module by the name of 
main.) Since the argument count is always given, the program does not 
have to know in advance how many arguments to expect. The character 
strings pointed at by elements of the array of pointers contain the argument 
information. 

The arguments are presented to the program traditionally as argc and 
argv, although any names you choose will work, argc is an integer that 
gives the count of the number of arguments. Since the command itself is 
considered to be the first argument, argv[0], the count is always at least one. 
argv is an array of pointers to character strings (arrays of characters ter- 
minated by the null character \0). 

If you plan to pass runtime parameters to your program, you need to 
include code to deal with the information. Two possible uses of runtime 
parameters are: 

■ as control data. Use the information to set internal flags that control 
the operation of the program. 

■ to provide a variable filename to the program. 

Figures 2-1 and 2-2 show program fragments that illustrate these uses. 
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#iiiclude <stdio,h> 




min(azgc, argv) 
int argc; 
char «argv[ ] ; 



void 6xit( ) ; 

int oflag = FALSE; 

int pflag = FALSE;/* Function Flags »/ 
int rflag = FALSE; 
int ch; 

vAiile ((ch = getopt( argc, argv, "opr")) != EOF) 
{ 

/♦ For options present, set flag to TRUE ♦/ 

/♦ If no c^Ttions present, print error message ♦/ 



svd.tcih (ch) 



case 'o': 



oflag 
break; 



case 'p': 



pflag 
break; 



case 'r': 



rflag 
break; 



default: 



( void ) f pr intf ( stderr , 

"Usage: %s [-opr]\n", argv[0]); 

exit(2); 





Figure 2-1: Using Command Line Arguments to Set Flags 
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#iiiclude <stdio.h> 

iiain(axgc, argv) 
int argc; 
char ♦argv[]; 

{ 

FUJE *fopen{), ♦fin; 
void pe n ar( ) , ex±t( ) ; 

if {argc > 1) 
{ 



if ((fin = fopen(azgvC1], "r")) NULL) 
{ 

/* First string (9fis) is program name {argv[0]) */ 
/* Second string (%s) is name of file that could */ 
/♦ not be opened (argv[ 1 ] ) */ 

( void)f printf ( stderr , 
carawt open %s: 
argv[0], argv[1]); 
perrarC'"); 
exit(2); 



Figure 2-2: Using argv[M] Pointers to Pass a Filename 



The shell, which makes arguments available to your program, considers 
an argument to be any non-blank characters separated by blanks or tabs. 
Characters enclosed in double quotes ("abc def") are passed to the program 
as one argument even if blanks or tabs are among the characters. It goes 
without saying that you are responsible for error checking and otherwise 
making sure the argument received is what your program expects it to be. 
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A third argument is also present, in addition to argc and argv. The 
third argument, known as envp, is an array of pointers to environment vari- 
ables. You can find more information on envp in the Programmer's Reference 
Manual under exec(2) and environ(5). 



System Calls and Subroutines 

System calls are requests from a program for an action to be performed 
by the UNIX system kernel. Subroutines are precoded modules used to sup- 
plement the functionality of a programming language. 

Both system calls and subroutines look like functions such as those you 
might code for the individual parts of your program. There are, however, 
differences between them: 

■ At link edit time, the code for subroutines is copied into the object 
file for your program; the code invoked by a system call remains in 
the kernel. 

■ At execution time, subroutine code is executed as if it was code you 
had written yourself; a system function call is executed by switching 
from your process area to the kernel. 

This means that while subroutines make your executable object file 
larger, runtime overhea<l for context switching may be less and execution 
may be faster. 

Categories of System Calls and Subroutines 

System calls divide fairly neatly into the following categories: 

■ file access 

■ file and directory manipulation 

■ process control 

■ environment control and status information 

You can generally tell the category of a subroutine by the section of the 
Programmer's Reference Manual in which you find its manual page. However, 
the first part of Section 3 (3C and 3S) covers such a variety of subroutines it 
might be helpful to classify them further. 
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■ The subroutines of sub-class 3S constitute the UNIX system /C 
Language standard I/O, an efficient I/O buffering scheme for C. 

■ The subroutines of sub-class 3C do a variety of tasks. They have in 
common the fact that their object code is stored in libc.a. They can 
be divided into the following categories: 

□ string manipulation 

□ character conversion 

□ character classification 

□ environment management 

□ memory management 

Figure 2-3 lists the functions that compose the standard I/O subroutines. 
Frequently, one manual page describes several related functions. In Figure 
2-3 the left hand column contains the name that appears at the top of the 
manual page; the other names in the same row are related functions 
described on the same manual page. 
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Function Name(s) Purpose 



fclose 


fflush 






close or flush a stream 


ferror 


feof 


clearerr 


fiieno 


stream status inquiries 


fopen 


freopen 


fdopen 




open a stream 


fread 


fwrite 






binary input /output 


fseek 


rewind 


ftell 




reposition a file pointer in a stream 


getc 


getchar 


fgetc 


getw 


get a character or word from a stream 


gets 


fgets 






get a string from a stream 


popen 


pclose 






begin or end a pipe to /from a process 


printf 


fprintf 


sprintf 




print formatted output 



For all functions: #include <stdio.h> 

The function name shown in bold gives the location in 
the Programmer's Reference Manual, Section 3. 

Figure 2-3: C Language Standard I/O Subroutines 
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Function Name(s) Purpose 



putc 


putchar fputc 


putw 


put a character or word on a stream 


puts 


fputs 




put a string on a stream 


scanf 


fscanf sscanf 




convert formatted input 


setbuf 


setvbuf 




assign buffering to a stream 


system 






issue a command through the shell 


tmnfile 






1-rcd.tc a leiriporary nxe 


tmpnam 


tempnam 




create a name for a temporary file 


ungetc 






push character back into input stream 


vprintf 


vfprintf vsprintf 




print formatted output of a varargs 
argument list 



For all functions: # include <stdio.h> 

The function name shown in bold gives the location in 
the Programmer's Reference Manual, Section 3, 

Figure 2-3: C Language Standard I/O Subroutines (continued) 



Figure 2-4 lists string handling functions that are grouped under the 
heading string(3C) in the Programmer's Reference Manual 
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String Operations 


strcatlsl, s2) 


append a copy of s2 to the end of si. 


strncat(sl, s2, n) 


append n characters from s2 to the end of si. 


strcmp(sl^ s2) 


compare two strings. Returns an integer less 
than, greater than or equal to to show that 
si is lexicographically less than, greater than 
or equal to s2. 


strncmp(sl, s2, n) 


compare n characters from the two strings. 
Results are otherwise identical to strcmp. 


strcpy(sl, s2) 


copy s2 to si, stopping after the null character 
(\0) has been copied. 


strncpy(sl, s2, n) 


copy n characters from s2 to si. s2 will be 
truncated if it is longer than n, or padded 
with null characters if it is shorter than n. 


strdup(s) 


returns a pointer to a new string that is a 
duplicate of the string pointed to by s. 


strchr(s, c) 


returns a pointer to the first occurrence of 
character c in string s, or a NULL pointer if c 
is not in s. 


strrchr(s, c) 


returns a pointer to the last occurrence of 
character c in string s, or a NULL pointer if c 
is not in s. 



For all functions: #include <string.h> 

string.h provides extern definitions of the string functions. 



Figure 2-4: String Operations 
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String Operations 



strlen(s) 


returns the number of characters in s up to 
the first null character. 


strpbrk(sl, s2) 


returns a pointer to the first occurrence in si 
of any character from s2, or a NULL pointer if 
no character from s2 occurs in si. 


strspn(sl, s2) 


returns the length of the initial segment of si, 
which consists entirely of characters from s2. 


strcspn(sl, s2) 


returns the length of the initial segment of si, 
which consists entirely of characters not from 
s2. 


strtok(sl, s2) 


look for occurrences of s2 within si. 



For all functions: #include <string.h> 

string.h provides extern definitions of the string functions. 



Figure 2-4: String Operations (continued) 



Figure 2-5 lists macros that classify ASCII character-coded integer 
values. These macros are described under the heading ctype(3C) in Section 
3 of the Programmer's Reference Manual. 
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Classify Characters 


isalpha(c) 


is c a letter 


isupper(c) 


is c an upper-case letter 


islower(c) 


is c a lower-case letter 


isdigitic) 


is c a digit [0-9] 


isxdigit(c) 


is c a hexadecimal digit [0-9], [A-F] or [a-f] 


isalnum(c) 


is c an alphanumeric (letter or digit) 


isspace(c) 


is c a space, tab, carriage return, new-line, vertical tab 
or form-feed 


ispunct(c) 


is c a punctuation character (neither control nor 
alphanumeric) 


isprint(c) 


is c a printing character, code 040 (space) through 
ui/o ^^tiiciey 


isgraph(c) 


same as isprint except false for 040 (space) 


iscntrl(c) 


is c a control character (less than 040) or a delete char- 
acter (0177) 


isascii(c) 


is c an ASCII character (code less than 0200) 



For all functions: #include <ctype.h> 
Nonzero return == true; zero return == false 

Figure 2-5: Classifying ASCII Character-Coded Integer Values 



Figure 2-6 lists functions and macros that are used to convert characters, 
integers, or strings from one representation to another. 



PROGRAMMING BASICS 2-23 



Programming Language Interface/The UNIX System 



Function Name(s) Purpose 



a641 


164a 




convert between long integer and 
base-64 ASCII string 


ecvt 


fcvt 


gcvt 


convert floating-point number to 
string 


13tol 


ltol3 




convert between 3-byte integer and 
long integer 


strtod 


atof 




convert string to double-precision 
number 


strtol 


atol 


atoi 


convert string to integer 



convOC): 


Translate Characters 


toupper 


lower-case to upper-case 


Joupper 


macro version of toupper 


tolower 


upper-case to lower-case 


_tolower 


macro version of tolower 


toascii 


turn off all bits that are not part of a standard 
ASCII character; intended for compatibility 
with other systems 



For all conv(3C) macros: #include <ctype.h> 
Figure 2-6: Conversion Functions and Macros 



Where the Manual Pages Can Be Found 

System calls are listed alphabetically in Section 2 of the Programmer's 
Reference Manual. Subroutines are listed in Section 3. We have described 
above what is in the first subsection of Section 3. The remaining subsec- 
tions of Section 3 are: 
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■ 3M — functions that make up the Math Library, libm 

■ 3X — various specialized functions 

■ 3F— the FORTRAN intrinsic function hbrary, libF77 

■ 3N — Networking Support Utilities 

How System Calls and Subroutines Are Used in C Programs 

Information about the proper way to use system calls and subroutines is 
given on the manual page, but you have to know what you are looking for 
before it begins to make sense. To illustrate, a typical manual page (for 
gets(3S)) is shown in Figure 2-7. 
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NAME 



gets, fgets - get a string from a stream 



SYNOPSIS 

#include <stdio.h> 



char *gets (s) 
char •s; 



char *£gets (s, n, stream) 

char 's; 

int n; 

FILE *stream; 



DESCRIPTION 

gets reads characters from the standard input stream, stdin, into the 
array pointed to by s, until a new-line character is read or an end- 
of-file condition is encountered. The new-line character is discarded 
and the string is terminated with a null character. 

fgets reads characters from the stream into the array pointed to by s, 
until H-1 characters are read, or a new-line character is read and 
transferred to or an end-of-file condition is encountered. The 
string is then terminated with a null character. 

SEE ALSO 

ferror(3S), fopen(3S), fread(3S), getc(3S), scanf(3S). 
DIAGNOSTICS 

If end-of-file is encountered and no characters have been read, no 
characters are transferred to s and a NULL pointer is returned. If a 
read error occurs, such as trying to use these functions on a file that 
has not been opened for reading, a NULL pointer is returned. Other- 
wise s is returned. 




Figure 2-7: Manual Page for gets(3S) 
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As you can see from the illustration, two related functions are described 
on this page: gets and fgets. Each function gets a string from a stream in a 
slightly different way. The DESCRIPTION section tells how each operates. 

It is the SYNOPSIS section, however, that contains the critical informa- 
tion about how the function (or macro) is used in your program. Notice in 
Figure 2-7 that the first line in the SYNOPSIS is 

#include <stdio.h> 

This means that to use gets or fgets you must bring the standard I/O header 
file into your program (generally right at the top of the file). There is something 
in stdio.h that is needed when you use the described functions. Figure 2-9 
shows a version of stdio.h. Check it to see if you can understand what gets 
or fgets uses. 

The next thing shown in the SYNOPSIS section of a manual page that docu- 
ments system calls or subroutines is the formal declaration of the function. The 
formal declaration tells you: 

■ the type of object returned by the function 

In our example, both gets and fgets return a character pointer. 

■ the object or objects the function expects to receive when called 

These are the things enclosed in the parentheses of the function, 
gets expects a character pointer. (The DESCRIPTION section sheds 
light on what the tokens of the formal declaration stand for.) 

■ how the function is going to treat those objects 

The declaration 
char *s; 

in gets means that the token s enclosed in the parentheses will be 
considered to be a pointer to a character string. Bear in mind that in 
the C language, when passed as an argument, the name of an array is 
converted to a pointer to the beginning of the array. 

We have chosen a simple example here in gets. If you want to test 
yourself on something a little more complex, try working out the meaning 
of the elements of the fgets declaration. 
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While we're on the subject of fgets, there is another piece of C esoterica 
that we'll explain. Notice that the third parameter in the fgets declaration 
is referred to as stream. A stream^ in this context, is a file with its associ- 
ated buffering. It is declared to be a pointer to a defined type FILE. Where 
is FILE defined? Right! In stdio.h. 

To finish off this discussion of the way you use functions described in 
the Programmer's Reference Manual in your own code, in Figure 2-8 we show 
a program fragment in which gets is used. 




naiji( ) 
{ 

char sarray[80]; 

for{ ; ; ) 
{ 

if (gets(sarray} != NULL) 

/♦ Do somethiiig with the string */ 

} 



} 




Figure 2-8: How gets Is Used in a Program 



You might ask, "Where is gets reading from?" The answer is, "From the 
standard input." That generally means from something being keyed in from 
the terminal where the command was entered to get the program running, 
or output from another command that was piped to gets. How do we know 
that? The DESCRIPTION section of the gets manual page says, "gets reads 
characters from the standard input . . Where is the standard input 
defined? In stdio.h. 
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#ifhcief _NFILE 
#define _NFILE20 

#aef ine BUFSIZ 1024 
^define _SBPSIZ 8 

typedef struct { 

int _cnt; 

unsigned char » ptr; 

unsigned char *_base; 

char _flag; 

char file; 
} FILE; 



#define 
#defijie 
^define 
#define 
{^define 

#define 
#define 
Mefine 



lOFEF 
lOREAD 
lOWRT 
ICNBF 
"lOMXBUF 
lOHOF 
lOEBR 
lOLBF 
"lOPW 



0000 
0001 
0002 
0004 
0010 
0020 
0040 
0100 
0200 



/* _IOIiBF means that a file's output ♦/ 
/♦ will be buffered line by line. ♦/ 
/* In additicjn to being flags, _IC»IBF,*/ 
/* lOLBF and lOIBF are possible */ 
/* values for "type" in set^rtx^f . ♦/ 



#ifhdef KULL 
Mef ine NULL 
#ei¥iLf 
#ifndef EOF 
#def ine EOF 
#er¥aif 



(-1) 



#def ine stdin 
#def ine stdout 
#def ine stderr 



(&^icd3[0]) 
(&"iob[1]) 
{& iob[2]) 



#define _bufend{p) 
jStiefine bufsiz(p} 



_bufendtab[ (p)->_f ile] 
(_twfend(p) - (p)->_base) 



J 

PROGRAMMING BASICS 2-29 



Programming Language Interlace/The UNIX System 



continued 



#ifndef lint 
#define getc(p) 
^define piitc(x, p) 



#dfifine 
#ciefine 
#define 
#define 
#define 
#de£iiie 
#errlif 



getchar( ) 

putc±iar(x) 

clearexr(p} 

feof(p) 

ferxor(p) 

fileno(p) 



(— (p)->_cnt < ? _filbuf(p) : (int) *(p)->_ptr++) 

(— (p)->_cnt < ? 

_flsbuf ((unsigned char) (x) , (p)) : 

(int) (*(p)->_ptr++ = (unsigned char) (x))) 

getc(stdin) 

patc((x), stdcut) 

((void) ((p)->_flag St- (_I0e?R I _IOB0F))) 
((p)->_flag 6. "lOBOF) 
{(p)->"flag & "lOERR) 
(p)-> iile 



extern FILE _iob[_NFILE] ; 

extern FILE *fqpen( ) , *fdopen( ) , ♦freopen( ) , *popen( } , *tirpf ile( ) ; 

extern looig ftelK ) ; 

extern void rewind{ ) , setbuf ( ) ; 

extern char *ctennid( ) , ♦caseridO, *fgets(), *gets(), *t€n5Hiam( ) , ♦tnpiamO; 
extern unsigned char *_buf endtab[ ] ; 



#de£ine L cterndd 
#def ine L_cuseidd 
4fdef ine P_tiqpdir 
#def ine L^tmpnam 
#endi£ 



9 
9 

"/usr/tirp/" 
(sizeof (P tnpdir) 



+ 15) 



Figure 2-9: A Version of stdio.h 



Header Files and Libraries 

In the earlier parts of this chapter there have been frequent references 
to stdio.h, and a version of the file itself is shov^n in Figure 2-9. stdio.h is 
the most commonly used header file in the UNIX system /C environment, 
but there are many others. 



2-30 PROGRAMMER'S GUIDE 



— Programming Language Interface/The UNIX System 

Header files carry definitions and declarations that are used by more 
than one function. Header filenames traditionally have the suffix .h, and 
are brought into a program at compile time by the C-preprocessor. The 
preprocessor does this because it interprets the #include statement in your 
program as a directive; as indeed it is. All keywords preceded by a pound 
sign (#) at the beginning of the line, are treated as preprocessor directives. 
The two most commonly used directives are #include and #define. We 
have already seen that the #include directive is used to call in (and process) 
the contents of the named file. The #define directive is used to replace a 
name with a token-string. For example, 

#define _NFILE 20 

sets to 20 the number of files a program can have open at one time. See 
cpp(l) for the complete list. 

In the pages of the Programmer's Reference Manual there are about 45 
different .h files named. The format of the #include statement for all these 
shows the file name enclosed in angle brackets ( <> ), as in 

#include <stdio.h> 

The angle brackets tell the C preprocessor to look in the standard places 
for the file. In most systems the standard place is in the /usr/include direc- 
tory. If you have some definitions or external declarations that you want to 
make available in several files, you can create a .h file with any editor, store 
it in a convenient directory and make it the subject of a #include statement 
such as the following: 

#include "../defs/rec.h" 

It is necessary, in this case, to provide the relative pathname of the file 
and enclose it in quotation marks (""). Fully-qualified pathnames (those that 
begin with /) can create portability and organizational problems. An alter- 
native to long or fully-qualified pathnames is to use the —Idir preprocessor 
option when you compile the program. This option directs the preprocessor 
to search for #include files whose names are enclosed in first in the 
directory of the file being compiled, then in the directories named in the -I 
option(s), and finally in directories on the standard list. In addition, all 
#include files whose names are enclosed in angle brackets ( < > ) are first 
searched for in the list of directories named in the -I option and finally in 
the directories on the standard list. 
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Object File Libraries 

It is common practice in UNIX system computers to keep modules of 
compiled code (object files) in archives; by convention, designated by a .a 
suffix. System calls from Section 2, and the subroutines in Section 3, subsec- 
tions 3C and 3S, of the Programmer's Reference Manual that are functions (as 
distinct from macros) are kept in an archive file by the name of libc.a. In 
most systems, libc.a is found in the directory /lib. Many systems also have 
a directory /usr/lib. Where both /lib and /usr/lib occur, /usr/lib is apt to 
be used to hold archives that are related to specific applications. 



NOTE 



Besides archive libraries, the UNIX System also supports shared libraries. 
Shared libraries have advantages in terms of saving disk space and 
memory. For more information about using shared libraries, see Chapter 8 
"Shared Libraries" in this text. 



During the link edit phase of the compilation and link edit process, 
copies of some of the object modules in an archive file are loaded with your 
executable code. By default the cc command that invokes the C compilation 
system causes the link editor to search libc.a. If you need to point the link 
editor to other libraries that are not searched by default, you do it by nam- 
ing them explicitly on the command line with the —1 option. The format of 
the -1 option is -Ix where x is the library name, and can be up to nine 
characters. For example, if your program includes functions from the curses 
screen control package, the option 

— Icurses 

will cause the link editor to search for /lib/libcurses.a or 
/usr/lib/libcurses.a and use the first one it finds to resolve references in 
your program. 

In cases where you want to direct the order in which archive libraries 
are searched, you may use the — L dir option. Assuming the — L option 
appears on the command line ahead of the -1 option, it directs the link edi- 
tor to search the named directory for libx.a before looking in /lib and 
/usr/lib. This is particularly useful if you are testing out a new version of a 
function that already exists in an archive in a standard directory. Its success 
is due to the fact that once having resolved a reference the link editor stops 
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looking. That's why the -L option, if used, should appear on the command 
line ahead of any —1 specification. 



Input/Output 

We talked some about I/O earlier in this chapter in connection with sys- 
tem calls and subroutines. A whole set of subroutines constitutes the C 
language standard I/O package, and there are several system calls that deal 
with the same area. In this section we want to get into the subject in a little 
more detail and describe for you how to deal with input and output con- 
cerns in your C programs. First off, let's briefly define what the subject of 
I/O encompasses. It has to do with 

■ creating and sometimes removing files 

■ opening and closing files used by your program 

■ transferring information from a file to your program (reading) 

■ transferring information from your program to a file (writing) 

In this section we will describe some of the subroutines you might 
choose for transferring information, but the heaviest emphasis will be on 
dealing with files. 

Three Files You Always Have 

Programs are permitted to have several files open simultaneously. The 
number may vary from system to system; the most common maximum is 20. 
_NFILE in stdio.h specifies the number of standard I/O FILEs a program is 
permitted to have open. 

Any program automatically starts off with three files. If you will look 
again at Figure 2-9, about midway through you will see that stdio.h con- 
tains three #define directives that equate stdin, stdout, and stderr to the 
address of _iob[0], _iob[l], and Job[2], respectively. The array Job holds 
information dealing with the way standard I/O handles streams. It is a 
representation of the open file table in the control block for your program. 
The position in the array is a digit that is also known as the file descriptor. 
The default in UNIX systems is to associate all three of these files with your 
terminal. 
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The real significance is that functions and macros that deal with stdin 
or stdout can be used in your program with no further need to open or 
close files. For example, gets, cited above, reads a string from stdin; puts 
writes a null-terminated string to stdout. There are others that do the same 
(in slightly different ways: character at a time, formatted, etc). You can 
specify that output be directed to stderr by using a function such as fprintf . 
fprintf works the same as printf except that it delivers its formatted output 
to a named stream, such as stderr. You can use the shell's redirection 
feature on the command line to read from or write into a named file. If you 
want to separate error messages from ordinary output being sent to stdout 
and thence possibly piped by the shell to a succeeding program, you can do 
it by using one function to handle the ordinary output and a variation of 
the same function that names the stream, to handle error messages. 

Named Files 

Any files other than stdin, stdout, and stderr that are to be used by 
your program must be explicitly connected by you before the file can be 
read from or written to. This can be done using the standard library rou- 
tine fopen. fopen takes a pathname (which is the name by which the file is 
known to the UNIX file system), asks the system to keep track of the con- 
nection, and returns a pointer that you then use in functions that do the 
reads and writes. 

A structure is defined in stdio.h with a type of FILE. In your program 
you need to have a declaration such as 

FIIiE ♦fin; 

The declaration says that fin is a pointer to a FILE. You can then assign the 
name of a particular file to the pointer with a statement in your program 
like this: 

fin = f open( "filename" , "r" ) ; 

where filename is the pathname to open. The "r" means that the file is to 
be opened for reading. This argument is known as the mode. As you 
might suspect, there are modes for reading, writing, and both reading and 
writing. Actually, the file open function is often included in an if state- 
ment such as: 

if ((fin = fopen( "filename", "r")) == NULL) 

( void )f printf ( stderr, "%s: Unable to open ir^t file %s\n",argv[0] , "filename") ; 
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that takes advantage of the fact that fopen returns a NULL pointer if it can't 
open the file. 

Once the file has been successfully opened, the pointer fin is used in 
functions (or macros) to refer to the file. For example: 

ixit c; 

c = getc(fin); 

brings in a character at a time from the file into an integer variable called c. 
The variable c is declared as an integer even though we are reading charac- 
ters because the function getcO returns an integer. Getting a character is 
often incorporated into some flow-of -control mechanism such as: 

v^le ((c = getc(fin)) != EOF) 



that reads through the file until EOF is returned. EOF, NULL, and the 
macro getc are all defined in stdio.h. getc and others that make up the 
standard I/O package keep advancing a pointer through the buffer associ- 
ated with the file; the UNIX system and the standard I/O subroutines are 
responsible for seeing that the buffer is refilled (or written to the output file 
if you are producing output) when the pointer reaches the end of the 
buffer. All these mechanics are mercifully invisible to the program and the 
programmer. 

The function fclose is used to break the connection between the pointer 
in your program and the pathname. The pointer may then be associated 
with another file by another call to fopen. This re-use of a file descriptor 
for a different stream may be necessary if your program has many files to 
open. For output files it is good to issue an fclose call because the call 
makes sure that all output has been sent from the output buffer before 
disconnecting the file. The system call exit closes all open files for you. It 
also gets you completely out of your process, however, so it is safe to use 
only when you are sure you are completely finished. 

Low-level I/O and Why You Shouldn't Use It 

The term low-level I/O is used to refer to the process of using system 
calls from Section 2 of the Programmer's Reference Mariual rather than the 
functions and subroutines of the standard I/O package. We are going to 
postpone until Chapter 3 any discussion of when this might be advanta- 
geous. If you find as you go through the information in this chapter that it 
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is a good fit with the objectives you have as a programmer, it is a safe 
assumption that you can work with C language programs in the UNIX sys- 
tem for a good many years without ever having a real need to use system 
calls to handle your I/O and file accessing problems. The reason low-level 
I/O is perilous is because it is more system-dependent. Your programs are 
less portable and probably no more efficient. 



System Calls for Environment or Status Information 

Under some circumstances you might want to be able to monitor or con- 
trol the environment in your computer. There are system calls that can be 
used for this purpose. Some of them are shown in Figure 2-10. 





Function Name(s) 


Purpose 


chdir 




change working directory 


chmod 




change access permission of a file 


chown 




change owner and group of a file 


getpid 


getpgrp getppid 


get process IDs 


getuid 


geteuid getgid 


get user IDs 


ioctl 




control device 


link 


unlink 


add or remove a directory entry 


mount 


umount 


mount or unmount a file system 


nice 




change priority of a process 


Stat 


fstat 


get file status 


time 




get time 


ulimit 




get and set user limits 


uname 




get name of current UNIX system 



Figure 2-10: Environment and Status System Calls 
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As you can see, many of the functions shown in Figure 2-10 have 
equivalent UNIX system shell commands. Shell commands can easily be 
incorporated into shell scripts to accomplish the monitoring and control 
tasks you may need to do. The functions are available, however, and may 
be used in C programs as part of the UNIX system /C Language interface. 
They are documented in Section 2 of the Programmers' Reference Manual. 



Processes 

Whenever you execute a command in the UNIX system you are initiat- 
ing a process that is numbered and tracked by the operating system. A 
flexible feature of the UNIX system is that processes can be generated by 
other processes. This happens more than you might ever be aware of. For 
example, when you log in to your system you are running a process, very 
probably the shell. If you then use an editor such as vi, take the option of 
invoking the shell from vi, and execute the ps command, you will see a 
display something like that in Figure 2-11 (which shows the results of a ps 
— f command): 




DID 


PID 


PPID 


C 


STIME 


TIT 


TIME 


0QM4AND 


abc 


24210 


1 





06:13:14 


tty29 


0:05 


-sh 


abc 


24631 


24210 





06:59:07 


t±y29 


0:13 


vi c2.uli 


alx: 


28441 


28358 


80 


09:17:22 


tty29 


0:01 


ps — f 


abc 


28358 


24631 


2 


09:15:14 


tty29 


0:01 


sh — i 




Figure 2-11: Process Status 



As you can see, user abc (who went through the steps described above) 
now has four processes active. It is an interesting exercise to trace the chain 
that is shown in the Process ID (PID) and Parent Process ID (PPID) 
columns. The shell that was started when user abc logged on is Process 
24210; its parent is the initialization process (Process ID 1). Process 24210 is 
the parent of Process 24631, and so on. 
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The four processes in the example above are all UNIX system shell level 
commands, but you can spawn new processes from your own program. 
(Actually, when you issue the command from your terminal to execute a 
program you are asking the shell to start another process, the process being 
your executable object module with all the functions and subroutines that 
were made a part of it by the link editor.) 

You might think, "Well, it's one thing to switch from one program to 
another when I'm at my terminal working interactively with the computer; 
but why would a program want to run other programs, and if one does, 
why wouldn't I just put everything together into one big executable 
module?" 

Overlooking the case where your program is itself an interactive appli- 
cation with diverse choices for the user, your program may need to run one 
or more other programs based on conditions it encounters in its own pro- 
cessing. (If it's the end of the month, go do a trial balance, for example.) 
The usual reasons why it might not be practical to create one monster exe- 
cutable are: 

B The load module may get too big to fit in the maximum process size 
for your system. 

H You may not have control over the object code of all the other 
modules you want to include. 

Suffice it to say, there are legitimate reasons why this creation of new 
processes might need to be done. There are three ways to do it: 

■ system(3S)— request the shell to execute a command 

H exec(2)— stop this process and start another 

H fork(2)— start an additional copy of this process 

system(3S) 

The formal declaration of the system function looks like this: 

#include <stdio.h> 

int systeroC string) 
char ♦string; 

The function asks the shell to treat the string as a command line. The 
string can therefore be the name and arguments of any executable program 
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or UNIX system shell command. If the exact arguments vary from one exe- 
cution to the next, you may want to use sprintf to format the string before 
issuing the system command. When the command has finished running, 
system returns the shell exit status to your program. Execution of your pro- 
gram waits for the completion of the command initiated by system and 
then picks up again at the next executable statement. 

exec(2) 

exec is the name of a family of functions that includes execv, execle, 
execve, execlp, and execvp. They all have the function of transforming the 
calling process into a new process. The reason for the variety is to provide 
different ways of pulling together and presenting the arguments of the 
function. An example of one version (execl) might be: 

execl("/bin/prog2", "prog", progargi, progarg2, (cShar ♦)0); 

For execl the argument list is 

/bin/prog2 path name of the new process file 

prog the name the new process gets in its argv[0] 

progargi, arguments to progl as char ^'s 
progarg2 

(char ♦)0 a null char pointer to mark the end of the arguments 

Check the manual page in the Programmer's Reference Manual for the rest 
of the details. The key point of the exec family is that there is no return 
from a successful execution: the calling process is finished, the new process 
overlays the old. The new process also takes over the Process ID and other 
attributes of the old process. If the call to exec is unsuccessful, control is 
returned to your program with a return value of —1. You can check errno 
(see below) to learn why it failed. 

fork(2) 

The fork system call creates a new process that is an exact copy of the 
calling process. The new process is known as the child process; the caller is 
known as the parent process. The one major difference between the two 
processes is that the child gets its own unique process ID. When the fork 
process has completed successfully, it returns a to the child process and 
the child's process ID to the parent. If the idea of having two identical 
processes seems a little funny, consider this: 
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■ Because the return value is different between the child process and 
the parent, the program can contain the logic to determine different 
paths. 

■ The child process could say, "Okay, I'm the child. Tm supposed to 
issue an exec for an entirely different program." 

■ The parent process could say, "My child is going to be execing a new 
process. Til issue a wait until I get word that that process is 
finished." 

To take this out of the storybook world where programs talk like people 
and into the world of C programming (where people talk like programs), 
your code might include statements like this: 
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int ch_stat, ch pid, status; 
char *progarg1; 
char -»progarg2; 
void exitO; 
esctem int ermo; 

if ((ch pid = fcxrkO) < 0) 
{ 

/* Could not fork. . . 
check ermo 

*/ 

} 

else if (ch pid ==0) /* child ♦/ 

{ 

(vDid)execl( "/bin/prog2'^ "prog", progarg1,pi:ogarg2, (char ♦)0) ; 
exit(2); /♦ execlO failed */ 

} 

else /* parent */ 

{ 

vAiile ((status = wait{6dCh_stat) ) 1= ch_pid) 
{ 

if (status < ermo == BCHILD) 

break; 
ermo = 0; 



} 

} 




Figure 2-12: Example of fork 



Because the child process ID is taken over by the new exec'd process, 
the parent knows the ID, What this boils down to is a way of leaving one 
program to run another, returning to the point in the first program where 
processing left off. This is exactly what the system(3S) function does. As a 
matter of fact, system accomplishes it through this same procedure of fork- 
ing and execing, with a wait in the parent. 
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Keep in mind that the fragment of code above includes a minimum 
amount of checking for error conditions. There is also potential confusion 
about open files and which program is writing to a file. Leaving out the 
possibility of named files, the new process created by the fork or exec has 
the three standard files that are automatically opened: stdin, stdout, and 
stderr. If the parent has buffered output that should appear before output 
from the child, the buffers must be flushed before the fork. Also, if the 
parent and the child process both read input from a stream, whatever is 
read by one process will be lost to the other. That is, once something has 
been delivered from the input buffer to a process the pointer has moved on. 

Pipes 

The idea of using pipes, a connection between the output of one pro- 
gram and the input of another, when working with commands executed by 
the shell is well established in the UNIX system environment. For example, 
to learn the number of archive files in your system you might enter a com- 
mand like: 

echo /lib/*.a /usr/lib/*.a | wc -w 

that first echoes all the files in /lib and /usr/lib that end in .a, then pipes 
the results to the wc command, which counts their number. 

A feature of the UNIX system /C Language interface is the ability to 
establish pipe connections between your process and a command to be exe- 
cuted by the shell, or between two cooperating processes. The first uses the 
popen(3S) subroutine that is part of the standard I/O package; the second 
requires the system call pipe(2). 

popen is similar in concept to the system subroutine in that it causes 
the shell to execute a command. The difference is that once having invoked 
popen from your program, you have established an open line to a con- 
currently running process through a stream. You can send characters or 
strings to this stream with standard I/O subroutines just as you would to 
stdout or to a named file. The connection remains open until your program 
invokes the companion pclose subroutine. A common application of this 
technique might be a pipe to a printer spooler. For example: 
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#include <stdio.h> 



madii( ) 
{ 



FILE *pptr; 
char *cutstrirg; 

if ((pptr = popen("lp","w")) != NULL) 
{ 

for(;;) 
{ 

/♦ Organize output */ 
( void )fprintf (pptr, "%s\n", outstring); 



pclose(pptr) ; 
} 



Figure 2-13: Example of a popen pipe 



Error HandDoinig 

Within your C programs you must determine the appropriate level of 
checking for valid data and for acceptable return codes from functions and 
subroutines. If you use any of the system calls described in Section 2 of the 
Programmer's Reference Manual, you have a way in which you can find out 
the probable cause of a bad return value. 
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UNIX system calls that are not able to complete successfully almost 
always return a value of -1 to your program. (If you look through the sys- 
tem calls in Section 2, you will see that there are a few calls for which no 
return value is defined, but they are the exceptions.) In addition to the -1 
that is returned to the program, the unsuccessful system call places an 
integer in an externally declared variable, errno. You can determine the 
value in errno if your program contains the statement 

#include <errrio.h> 

The value in errno is not cleared on successful calls, so your program 
should check it only if the system call returned a -1. The errors are 
described in intro(2) of the Programmer's Reference Manual. 

The subroutine perror(3C) can be used to print an error message (on 
stderr) based on the value of errno. 



Signals and Interrupts 

Signals and interrupts are two words for the same thing. Both words 
refer to messages passed by the UNIX system to running processes. Gen- 
erally, the effect is to cause the process to stop running. Some signals are 
generated if the process attempts to do something illegal; others can be ini- 
tiated by a user against his or her own processes, or by the super-user 
against any process. 

There is a system call, kill, that you can include in your program to 
send signals to other processes running under your user-id. The format for 
the kill call is: 

kilKpid, sig) 

where pid is the process number against which the call is directed, and sig 
is an integer from 1 to 19 that shows the intent of the message. The name 
"kill" is something of an overstatement; not all the messages have a "drop 
dead" meaning. Some of the available signals are shown in Figure 2-14 as 
they are defined in <sys/signaLh>, 
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*^^*Tr ITI*T 




1 
1 


/♦ hancfi;^ ♦/ 


^dcf me 




2 


/* intemapt (rubcut) ♦/ 




QTfa 1 1 1 


•i 
J 


/♦ quit (ASCXX FS) ♦/ 




>i Hir 1 1 .1 ■ 


A 


/* illegal instructicjn (not reset vftien caizght)*/ 






c 
D 


/» trace trap (not reset vdien caught) ♦/ 


#aerme 


SXGXOT 


6 


/* XOT instruction */ 






o 


/» used by abort, replace SXGXOT in the future ♦/ 






/ 


/* EMT instruction ♦/ 




SZGfTE 


Q 

O 


/* floating point exception */ 




SIGKILL 


9 


/* ld.ll (cannot be caught or ignored) ♦/ 


#de£ine 


SIGBUS 


10 


/* bus error */ 


iMefine 


SIGSEGV 


11 


/♦ segmentation violation ♦/ 




SIG5Y5 


12 


/♦ bed argument to system call ♦/ 




SIGPIFE 


13 


/* virite on a pipe vdth no one to read it */ 


^define 


SIGAUIM 


14 


/« alarm clock »/ 


#defiiie 


SIGTBRM 


15 


/* software termination signal from kill */ 


#define 


SIGUSR1 


16 


/♦ user defined signal 1 «/ 


^define 


SIGUSR2 


17 


/♦ user defined signal 2 «/ 


^define 




1o 


/♦ death of a child ♦/ 


#ciecine 






/* pcwer-fail restart */ 








/* SXt^vXKD ana SXGFnunci only used m UNIX/PC */ 


/«#3fifine 


SIGWIND 


20 */ 




/*#define 


SIGPHGNE 


21 ♦/ 


/» handset, line status change */ 




SXGFOLL 


22 


/♦ pollable event occurred »/ 


#aefine 


NSXG 


23 


/* The valid signal number is from 1 to NSIG-1 ♦/ 




MAXSXG 


32 


/♦ size of u_signal[], NSIG-1 <= MAXSXG*/ 








/* MAXSIG is larger than vre need now. */ 








/* In the future, we can add more signal ♦/ 








/* nvBiiber without changing laser.h ♦/ 




Figure 2-14: Signal Numbers Defined in /usr/include/sys/signal.h 
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The signal(2) system call is designed to let you code methods of dealing 
with incoming signals. You have a three-way choice. You can a) accept 
whatever the default action is for the signal, b) have your program ignore 
the signal, or c) write a function of your own to deal with it. 
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The UNIX system provides several commands designed to help you dis- 
cover the causes of problems in programs and to learn about potential prob- 
lems. 



Sample Program 

To illustrate hovs^ these commands are used and the type of output they 
produce, v^e have constructed a sample program that opens and reads an 
input file and performs one to three subroutines according to options 
specified on the command line. This program does not do anything you 
couldn't do quite easily on your pocket calculator, but it does serve to illus- 
trate some points. The source code is shown in Figure 2-15. The header 
file, recdef .h, is shown at the end of the source code. 

The output produced by the various analysis and debugging tools illus- 
trated in this section may vary slightly from one installation to another. 
The Programmer's Reference Manual is a good source of additional informa- 
tion about the contents of the reports. 
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r 




/♦ module — restate.c */ 



#include <stdio.h> 
#include "reodef .h" 

#def ine 1KUE 1 
#de£ine FA££E 

nmii(argc, argv) 
int argc; 
char ♦argv[]; 
{ 

mJE ♦fopen{ ) , ♦fin; 

void exit( ) ; 

int getopt( ) ; 

int of lag = FALSE; 

int pflag = FALSE; 

int rflag = FALSE; 

int ch; 

strxict rec first; 

extern int opterr; 

extern float cppty{ ) , pft( ) , rf e( ) ; 

if (argc < 2) 
{ 

(void) fprintf (stderr, "%s: Must specify optian\n",argv[0]) ; 
(void) fprintf (stderr, "Usage: %s -rpo\n", argv[0]); 
exit(2); 

} 

opterr = FALSE; 

vdiile ((ch = getopt(argc,argv,"cpr")) 1= EOF) 
{ 

switch(ch) 



case 'o': 

of lag = mUE; 

break; 
case 'p': 

pflag = TKUE; 

break; 
case 'r': 

rflag = 1KUE; 

break; 
default: 
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continued 



(void) fprintf (stderr, "Usage: 5fe -rpo\n",argv[0] ) ; 
exit(2); 

} 

} 

if ({fin = fopen("info","r")) == NULL) 
{ 

(void) fporintf (stderr, cannot open input file %s\n",argv[0],"info") ; 

e3cit(2); 

} 

if (fscanf(fin, "9te9fif96f9fif56f%f%f",first.pnaine,&first.ppK. 
&first.dp,&first.i,&first.c,&first.t,&first.sp}c) != 7) 
{ 

(void) fprintf (stderr, "9fe: cannot read first record frcm %s\n", 

argv[0],"info"); 
e3dLt{2); 

} 

printf ( "Property : %sNn" , first . piame ) ; 
if(oflag) 

printf("Op530rtunity Cost: $56f»5.2fNn".qRpty(6J^irst) ) ; 
if(pflag) 

printf ( "Anldcipated Prof it(loss) : $%#7.2f\n" ,pft(&f irst) ) ; 
if(rflag) 

printf ("Return cn Funds Ehplqyed: %#3.2f9©S\n",rfe(&first) ); 

> 

/» End of l^in Module — restate.c «/ 

/* Opportunity Cost — oppty.c «/ 
#include "reodef .h" 

float 
oppty{ps) 
struct rec *ps; 
{ 

retum(ps->i/12 * ps->t * ps->dp); 

} 
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A Profit — pft.c */ 



#iiiclude "recdef .h" 

float 
pft{ps) 

struct rec «ps; 
{ 

retum{ps->sp)c - ps->ppx + ps->c); 

} 

/♦ Return on Ponds Eliplqyed ~ rfe.c */ 

/include "recdef .h" 

float 
rfe(ps) 

struct rec *ps; 
{ 

retum( 100 * (ps->spK - ps->c) / ps->spx) ; 

} 

/* Header File — recdef.h */ 

struct rec { /♦ To hold input */ 

char pBiaine[25]; 
float ppoc; 
float dp; 
float i; 
float c; 
float t; 
float spc; 



) ; 




Figure 2-15: Source Code for Sample Program 
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cflow 

cflow produces a chart of the external references in C, yacc, lex, and 
assembly language files. Using the modules of our sample program, the 
command 

cflow restate.c oppty.c pft.c rfe.c 

produces the output shown in Figure 2-16. 



1 main: int(), <restate.c 11> 

2 fprintf: <> 

3 exit: <> 

4 getopt: <> 

5 fopen: <> 

6 fscaiif: <> 

7 printf : <> 

8 opp t y ; float{)» <oppty.c 7> 

9 pft: floatO, <pft.c 7> 

10 rfe: floatO, <rfe,c 8> 

V 

Figure 2-16: cflow Output, No Options 
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The — r option looks at the callerxallee relationship from the other side. 
It produces the output shown in Figure 2-17. 



1 exit: <> 

2 nain : <> 

3 fcpen: <> 

4 main : 2 

5 fprintf: <> 

6 nain : 2 

7 fsoanf: <> 

6 main : 2 

9 getopt: <> 

10 nain : 2 

11 main: int(), <restate,c 11> 

12 oppty: floatO, <€qpfpty.c 7> 

13 nain : 2 

14 pft: floatO, <pft.c 7> 

15 nain : 2 

16 printf: <> 

17 main : 2 

18 rfe: floatO, <rfe.c 8> 

19 nain : 2 



Figure 2-17: cflow Output, Using -r Option 
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The — ix option causes external and static data symbols to be included. 
Our sample program has only one such symbol, opterr. The output is 
shown in Figure 2-18. 



1 main: int( ) » <restate.c 11> 

2 fprintf: <> 

3 exit: <> 

4 opterr: <> 

5 getopt: <> 

6 fcpen: <> 

7 fscanf: <> 

8 printf: <> 

9 op p t y : floatO, < opp t y .c 7> 

10 pft: float()» <pft.c 7> 

11 rfe: floatO, <rfe.c 8> 



Figure 2-18: cflow Output, Using -ix Option 
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Combining the — r and the — ix options produces the output shown in 
Figure 2-19. 



Figure 2-19: cflow Output, Using — r and — ix Options 



ctrace 

ctrace lets you follow the execution of a C program statement by state- 
ment, ctrace takes a .c file as input and inserts statements in the source 
code to print out variables as each program statement is executed. You must 
direct the output of this process to a temporary .c file. The temporary file is 
then used as input to cc. When the resulting a.out file is executed it pro- 
duces output that can tell you a lot about what is going on in your program. 




1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 



exit: <> 

nain : <> 
fopen: <> 

min : 2 
fprintf: <> 

main : 2 
fscanf: <> 

min : 2 
getppt: <> 

naon : 2 
main: int{), <restate.c 11> 
oppty; floatO, <05ppty.c 7> 

nain : 2 
opterr: <> 

main : 2 
pft: floatO, <pft.c 7> 

min : 2 
printf: <> 

min : 2 
rfe: floatO, <rfe.c 8> 

min : 2 






2-54 PROGRAMMER'S GUIDE 



Analysis/Debugging 



Options give you the ability to limit the number of times through loops. 
You can also include functions in your source file that turn the trace off and 
on so you can limit the output to portions of the program that are of partic- 
ular interest 

ctrace accepts only one source code file as input. To use our sample 
program to illustrate, it is necessary to execute the following four com- 
mands: 

ctrace restate.c > ct.main.c 
ctrace oppty.c > ct.op.c 
ctrace pf tc > ct.p.c 
ctrace rfe.c > ct.r.c 



The names of the output files are completely arbitrary. Use any names 
that are convenient for you. The names must end in .c, since the files are 
used as input to the C compilation system. 

cc — o ct.nin ct.main.c ctop.c ct.p.c ct.r.c 

Now the command 

ct.run — opr 

produces the output shown in Figure 2-20. The command above will cause 
the output to be directed to your terminal (stdout). It is probably a good 
idea to direct it to a file or to a printer so you can refer to it. 



8 madnCargc, argv) 
23 if (argc < 2) 
/» argc == 2 */ 

30 op te rr = FALSE; 
/* FALSE == ♦/ 
/* opterr == */ 

31 viiile ((ch = getopt(argc,argv,"opa:" ) ) != EOF) 
/♦ argc == 2 ♦/ 

/♦ argv == 15729316 ♦/ 

/♦ cii == 111 or 'o' cr "t" ♦/ 

32 { 

33 switxh(ch) 

/» ch == 111 or 'o' or "t" »/ 

35 case 'o': 

36 oflag = TFUE; 
/» IWJE == 1 or "h" ♦/ 
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37 break; 



48 } 

31 \Aiile ((ch - getx^(argc,argv,"<^")) != EOF) 
/♦ argc == 2 ♦/ 

/♦ arqv == 15729316 ♦/ 
/* ch == 112 or 'p' »/ 

32 { 

33 switch(ch) 

/♦ ch == 112 or 'p' ♦/ 

38 case 'p': 

39 pflag = 1RDE; 

/* mJE == 1 or "h" */ 
/♦ pf lag == 1 or "h" #/ 

40 break; 
48 } 

31 while {(ch = getopt(argc,argv/'apr") ) != EOF) 
/* argc == 2 */ 

/# argv == 15729316 »/ 
A ch == 114 or 'r' */ 

32 { 

33 switch{ch) 

/* ch == 114 or 'r' ♦/ 

41 case 'r': 

42 rflag = IHJE; 

/* TOUE = 1 or "h" */ 
A rflag == 1 cr "h" ♦/ 

43 break; 

48 } 

31 vAiile ((ch = getcpt(argc,argv,"op:") ) != EOF) 
/* argc == 2 «/ 
A argv == 15729316 */ 
A ch == -1 ♦/ 

49 if ((fin = fppen( "info","r") ) == NULL) 
A fin == 140200 */ 

54 if (fscanf(fin, "%s%f%f%f?tf%f%f",first.pname,&first.ppjc, 

&first.<%>,&first.i,6first.c,&first,t,6ifirst.sp>c) t= 7) 

A fin == 140200 ♦/ 

A first.pname == 15729528 ♦/ 
61 printf ("Property; 3feO,first.pnanie) ; 

A first.pname == 15729528 or "Ldnden Place" */ Property: Linden Place 
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continued 



63 if(oflag) 

/* of lag == 1 or "h" ♦/ 

64 poriLntf ("Opixjrtunity Cost: $3«#5.2fO,oppty(&first) ) ; 
5 oppty(ps) 

8 retum(ps->i/12 ♦ ps->t * ps-x^) ; 
/♦ ps->i == 1069044203 ♦/ 
/♦ ps->t == 1076494336 »/ 

/* ps-xip == 1088765312 ♦/ Opportunity Cost: $4476.87 



66 if(pflag) 

A pf lag == 1 or "h" ♦/ 

67 printf( "Anticipated Prof it (loss); $%#7.2f0,pft(&first) ) ; 

5 pft(ps) 

8 retam(ps->sp}c - ps->ppjc + ps->c); 
/* ps->sp)C == 1091649040 ♦/ 

/* ps->ppic == 1091178464 ♦/ 

/* ps->c == 1087409536 */ Anticipated Profit(loss) : $85950.00 

69 if(rflag) 

/* rf lag 1 or "h" ♦/ 

70 printf( "Return en Funds Employed: %#3.2f9©«).rfe(&first) ) ; 

6 rfe(ps) 

9 retumdOO # (ps->spx - ps->c) / ps->sp)c); 
/* ps->spx == 1091649040 «/ 

/♦ ps->c == 1087409536 •/ Return on Funds Enqplciyed: 94.00% 



/♦ return ♦/ 



Figure 2-20: ctrace Output 



Using a program that runs successfully is not the optimal way to 
demonstrate ctrace. It would be more helpful to have an error in the opera- 
tion that could be detected by ctrace. It would seem that this utility might 
be most useful in cases where the program runs to completion, but the out- 
put is not as expected. 
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cxref 

cxref analyzes a group of C source code files and builds a cross- 
reference table of the automatic, static, and global symbols in each file. 

The command 

cxref — c — o cx.op restate«c oppty.c pf t,c rfe.c 

produces the output shown in Figure 2-21 in a file named, in this case, 
cx.op. The — c option causes the reports for the four .c files to be combined 
in one cross-reference file. 



restate. c: 
oppty.c: 
pft.c: 
rfe.c: 



S1£MB0L 


FHiE 


HJNCTION 


LINE 




BUFSIZ 


/usr/incliaae/stdio .h 




♦9 




EOF 


/usr /incliade/stdio . h 




49 ♦SO 






restate. c 




31 




FALSE 


restate. c 




*6 15 


16 17 30 


FILE 


/usr/include/stdio . h 




*29 73 


74 




restate. c 


min 


12 




L_ctennld 


/usr/inclvde/stdio .h 




*80 




Lcuserid 


/usr/include/stdio.h 




♦81 




L 1 imHVRTH 


/usr/ijiclude/stdio . h 




*83 




NULL 


/lasr/ijiclude/stdio . h 




46 *47 






restate. c 




49 




P tirpdir 


/usr/include/stdio . h 




*82 




TFSJB 


restate. c 




*5 36 


39 42 


IQEDF 


/usr/include/stdio . h 




*41 




IQBBR 


/usr/include/stdio . h 




*42 




_I0FBP 


/usr/include/stdio .h 




*36 




^lOLBF 


/usr/include/stdio .h 




♦43 




lOMXEUF 


/usr/include/stdio .h 




♦40 




_ICNBF 


/usr/include/stdio .h 




♦39 




_ICREAD 


/\asr/incl\ide/stdio .h 




♦37 




"lORW 


/usr/ijiclude/stdio.h 




♦44 




_ICIWRr 


/iisr/include/stdio . h 




♦38 




NFILE 


/usr/incltde/stdio . h 




2 ^3 73 


SBFSIZ 


/usr/include/stdio . h 




♦ 16 
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continued 



SYMBOL 
base 

"bufendO 

_bufendtab 
_bafsiz( ) 

'file 
'flag 
"iob 

ptr 
argc 

argv 

c 



ch 

clearerr( ) 
cterniid{ ) 
cuserid( ) 

exitO 
fdopenC ) 



FILE FTJNCTION 
/usr/include/stdio . h 

/usr/include/stdio . h 
/usr/include/stdio . h 

/usr/include/stdio . h 
/usr/iiicl\2de/stdio . h 
/usr/include/stdio . h 
/usr /include/stdio . h 
/usr/include/stdio . h 
restate, c nain 
/usr/incliBie/stdio. h 
restate. c 

restate. c main 
restate. c 

restate. c nain 
./reodef.h 

pft.c pft 
restate. c main 
rfe.c rfe 
restate. c main 

/usr/incltjde/stdio . h 

/usr/include/stdio. h 

/usr/include/stdio . h 
, /reodef.h 

oppty.c oppty 
restate, c main 

restate. c nain 

/usr/incltide/stdio . h 



LIKE 
♦26 

♦57 
♦78 

♦58 
♦20 
♦28 
♦27 
♦73 

25 26 45 51 57 

♦21 

8 

♦9 23 31 
8 

♦10 25 26 31 45 51 57 

♦6 

8 

55 
9 

♦18 31 33 

♦67 

♦77 

♦77 
— *4 
8 

55 

♦ 13 27 46 52 58 
♦74 
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continued 



SYMBOL 

feofO 

f error( ) 

fgetsO 

f ileno( ) 

fin 
first 
fopen( } 

fprintf 
f reopen ( ) 

fsoanf 
ftelK ) 

getcO 

getchar( ) 

getx3pt( ) 

gets( ) 



lint 
nain( ) 



FILE 

/usr/include/stdio . h 

/usr/inclnde/stdio .h 

/usr/include/stdio . h 

/usr/include/stdio . h 
restate. c 
restate. c 

/usr/include/stdio . h 
restate. c 
restate. c 

/usr/include/stdio . h 
restate. c 

/usr/include/stdio . h 
/usr/include/stdio . h 
/usr/include/stdio .h 
restate, c 

/\isr/incliKle/stdio .h 
./recdef .h 
oppty.c 
restate. c 

/usr/include/stdio .h 
restate. c 



FUNCnCN 



nam 
nmn 



mm 
min 



oppty 



LINE 

*68 

♦69 

♦77 

♦70 

♦12 49 54 

♦19 54 55 61 64 67 70 

♦74 
12 49 

25 26 45 51 57 

♦74 
54 

♦75 

♦61 

♦65 

♦14 31 

♦77 

♦5 

8 

55 

60 

♦8 




Analysis/Debugging 

SXMBOL FILE FUNCITON LINE 

Of lag restate. c nain *15 36 63 

oppty( ) 

oppty.c — #5 

restate. c nain *21 64 

opterr restate, c nain #20 30 

p /usr/include/stdio.h ~ ♦S? *58 ♦SI 62 

*62 63 64 67 *61 68 »68 69 #69 70 *70 

pdp11 /usr/inclTide/stdio.h ~- 11 

pflag restate, c nain *16 39 66 

pftO 

pft.c — *5 

restate. c nain *21 67 

pname ./recsdef.h — #2 

restate. c nain 54 61 

popenO 

/usr/include/stdio.h ~ #74 

ppx ./recjdef.h — *3 

pft.c pft 8 

restate, c nain 54 

printf restate. c nain 61 64 67 70 

ps oppty.c 5 

oppty.c opp t y *6 8 

pft.c — 5 

pft.c pft »6 8 

rfe.c — 6 

rfe.c rfe #7 9 

putcO 

/usr/include/stdio.h — #62 

putchar( ) 

/usr/include/stdio.h — #66 

rec ./reodef.h — *1 

oppty.c opp t y 6 

pft.c pft 6 

restate. c main 19 

rfe.c rfe 7 



V y 
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continued 



SYMBOL 
rewind( ) 

rfeO 



rflag 
setbufO 

spx. 



stderr 

stdln 

stdoat 

t 



tex[92nain( ) 

tnpfile( ) 

iiiTi.Hiain( ) 

u370 
u3b 
u3b5 
vax 

X 



FILE RJNCTICN 

/usr/iiicl\jdfi/stdio . h 

restate, c main 
rfe.c 

restate. c main 

/\asr /include/stdio . h 
./reodef.h 

pft.c pft 
restate, c itain 
rfe.c rfe 
/usr/include/stdio .h 
restate. c 

/usr/include/stdio . h 
/usr/include/stdio . h 
, /reodef.h 

c^ppty.c oppty 
restate. c main 

/usr/include/stdio . h 

/usr/include/stdio . h 

/usr/include/stdio . h 
/usr/include/stdio . h 
/usr/include/stdio . h 
/usr/include/stdio . h 
/usr /include/stdio . h 
/usr/include/stdio . h 



LINE 
»76 

♦21 70 
*6 

♦17 42 69 

♦76 

♦8 

8 

55 

9 

♦55 

25 26 45 51 57 

♦53 

♦54 

♦7 

8 

55 

♦77 

♦74 

♦77 
5 

p 19 
. 19 
3 9 

. 2 63 64 66 ^66 



Figure 2-21: cxref Output, Using — c Option 
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lint 

lint looks for features in a C program that are apt to cause execution 
errors, that are wasteful of resources, or that create problems of portability. 

The command 

lint restate*c oppty.c pf t.c rfe.c 

produces the output shown in Figure 2-22. 




restate. c 



(71) warning: inaiii( ) returns random value to invocation environment 

oppty.c; 

pft.c: 

rfe.c: 



function returns value vMch is always ignored 



porintf 




Figure 2-22: lint Output 



lint has options that will produce additional information. Check the 
User's Reference Manual. The error messages give you the line numbers of 
some items you may want to review. 
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prof 

prof produces a report on the amount of execution time spent in various 
portions of your program and the number of times each function is called. 
The program must be compiled with the — p option. When a program that 
was compiled with that option is run, a file called mon.out is produced, 
mon.out and a.out (or whatever name identifies your executable file) are 
input to the prof command. 

The sequence of steps needed to produce a profile report for our sample 
program is as follows: 

Step 1: Compile the programs with the — p option: 

cc — p restatex oppty.c pft.c rf e.c 
Step 2: Run the program to produce a file mon.out. 

a.out — opr 
Step 3: Execute the prof command: 

prof a.out 

The example of the output of this last step is shown in Figure 2-23. The 
figures may vary from one run to another. You will also notice that pro- 
grams of very small size, like that used in the example, produce statistics 
that are not overly helpful. 
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SeOGEnds 


Cumsecs 


#Ccl11s 


msec/call 


Name 


DU .U 


.03 


0.03 


3 


8, 


fcvt 


20.0 


0.01 


n ctA 
u ,u** 


o 


2. 


atDf 


20,0 


n n 1 

u , U 1 


A AC 

U .U3 


5 


2. 


write 


10 




A AC 

U .U!> 




5. 


fwrite 


0.0 


0.00 


n Ci^ 




U . 


mCEnitor 


0,0 


0,00 


n nR 

U • U3 




u . 


exeat 


0.0 


0,00 


n n(^ 

U • U3 




u . 


printf 


0.0 




n AC 
u . us 




0. 


profil 


0.0 


0.00 


n nt; 

U . U3 




. 


fscanf 


n n 

u » u 


n nn 
u , uu 


n AC 
U . UD 




0. 


_doscan 


n n 


U.uU 


0.05 




0. 


oppty 


n n 
u > u 


n nn 
U.UU 


A AC 

U.UD 




0. 


_filhaf 


0.0 


0.00 


n n^ 




U. 


strchr 


0,0 


nn 


n nt; 

U * UD 




U . 


strcnp 


0.0 


n nn 

U.UU 


A AC 
U.U3 




0. 


Idexp 


0.0 


n nn 
u . uu 


A AC 
U.U3 




0. 


getenv 


n n 
u • u 


U .UU 


0.05 




0. 


fqpen 


0.0 


n nn 
u . uu 


n nR 

U .UD 




0. 


^findiop 


n n 


n nn 
u .uu 


A AC 

U.Ud 




0. 


open 


n n 

U, VJ 


n nn 
u • uu 


A AC 

U,UD 




0. 


nain 


0.0 


0.00 


n n*\ 

U ,U3 




ft 

u. 


read 


VJ . u 


U . UU 


0,05 




0. 


strcpy 


n n 
u . u 


A AA 
U, UU 


0.05 


14 





ungetc 




A AA 
U. UU 


0.05 




0. 


_dopimt 


0,0 


0,00 


n n(% 

U . U3 




. 


pft 




A AA 
U. UU 


0.05 




0. 


rfe 


0.0 


0.00 


0.05 




u . 


xflsbuf 


0.0 


0.00 


0.05 


1 


0. 


_wrtchk 


0.0 


0.00 


0.05 


2 


0. 


_findbuf 


0.0 


0.00 


0.05 


2 


0. 


isatty 


0.0 


0.00 


0.05 


2 


0. 


ioctl 


0.0 


0,00 


0.05 


1 


0. 


malloc 


0.0 


0,00 


0.05 


1 


0. 


menchr 


0.0 


0.00 


0.05 


1 


0. 


memcpy 


0.0 


0.00 


0.05 


2 


0. 


sbrk 


0.0 


0,00 


0.05 


4 


0. 


getppt 



Figure 2-23: prof Output 
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size 

size produces information on the number of bytes occupied by the three 
sections (text, data, and bss) of a common object file when the program is 
brought into main memory to be run. Here are the results of one invoca- 
tion of the size command with our object file as an argument, 

11832 + 3872 + 2240 = 17944 

Don't confuse this number with the number of characters in the object 
file that appears when you do an Is —1 command. That figure includes the 
symbol table and other header information that is not used at run time. 



strip 

strip removes the symbol and line number information from a common 
object file. When you issue this command the number of characters shown 
by the is -1 command approaches the figure shown by the size command, 
but still includes some header information that is not counted as part of the 
.text, .data, or .bss section. After the strip command has been executed, it is 
no longer possible to use the file with the sdb command. 



sdb 

sdb stands for Symbolic Debugger, which means you can use the sym- 
bolic names in your program to pinpoint where a problem has occurred. 
You can use sdb to debug C, FORTRAN 77, or PASCAL programs. There 
are two basic ways to use sdb: by running your program under control of 
sdb, or by using sdb to rummage through a core image file left by a pro- 
gram that failed. The first way lets you see what the program is doing up 
to the point at which it fails (or to skip around the failure point and 
proceed with the run). The second method lets you check the status at the 
moment of failure, which may or may not disclose the reason the program 
failed. 
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Chapter 15 contains a tutorial on sdb that describes the interactive com- 
mands you can use to work your way through your program. For the time 
being we want to tell you just a couple of key things you need to do when 
using it. 

1 . Compile your program(s) with the -g option, which causes addi- 
tional information to be generated for use by sdb. 

2. Run your program under sdb with the command: 

sdb myprog — srcdir 

where myprog is the name of your executable file (a.out is the 
default), and srcdir is an optional list of the directories where source 
code for your modules may be found. The dash between the two 
arguments keeps sdb from looking for a core image file. 
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The following three utilities are helpful in keeping your programming 
work organized effectively. 

The make Command 

When you have a program that is made up of more than one module of 
code you begin to run into problems of keeping track of which modules are 
up to date and which need to be recompiled when changes are made in 
another module. The make command is used to ensure that dependencies 
between modules are recorded so that changes in one module results in the 
re-compilation of dependent programs. Even control of a program as simple 
as the one shown in Figure 2-15 is made easier through the use of make. 

The make utility requires a description file that you create with an edi- 
tor. The description file (also referred to by its default name: makefile) con- 
tains the information used by make to keep a target file current. The target 
file is typically an executable program. A description file contains three 
types of information: 

dependency information tells the make utility the relationship between 

the modules that comprise the target program. 

executable commands needed to generate the target program, make 

uses the dependency information to determine 
which executable commands should be passed 
to the shell for execution. 

macro definitions provide a shorthand notation within the 

description file to make maintenance easier. 
Macro definitions can be overridden by infor- 
mation from the command line when the make 
command is entered. 

The make command works by checking the "last changed" time of the 
modules named in the description file. When make finds a component that 
has been changed more recently than modules that depend on it, the 
specified commands (usually compilations) are passed to the shell for execu- 
tion. 
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The make command takes three kinds of arguments: options, macro 
definitions, and target filenames. If no description filename is given as an 
option on the command line, make searches the current directory for a file 
named makefile or Makefile. Figure 2-24 shows a makefile for our sample 
program. 



CBJBCTS = restate,© oppty.o pft.o rfe.o 
all: restate 
restate: $(aBJBC7rs) 

$(0C) $(CELASS) $(IJ)FLAGS) $(CffiJBCTS) -O restate 

$(OBJBCTS): ./reodef.h 

clean: 

rm -f $(CfflJBCrS) 

clobber: clean 

rm -f restate 

v 

Figure 2-24: make Description File 



The following things are worth noticing in this description file: 

■ It identifies the target, restate, as being dependent on the four object 
modules. Each of the object modules in turn is defined as being 
dependent on the header file, recdef.h, and by default, on its 
corresponding source file. 

■ A macro, OBJECTS, is defined as a convenient shorthand for referring 
to all of the component modules. 

Whenever testing or debugging results in a change to one of the com- 
ponents of restate, for example, a command such as the following should be 
entered: 

make CFLAGS=-g restate 
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This has been a very brief overview of the make utility. There is more 
on make in Chapter 3, and a detailed description of make can be found in 
Chapter 13. 



The Archive 

The most common use of an archive file, although not the only one, is 
to hold object modules that make up a library. The library can be named on 
the link editor command line (or with a link editor option on the cc com- 
mand line). This causes the link editor to search the symbol table of the 
archive file when attempting to resolve references. 

The ar command is used to create an archive file, to manipulate its con- 
tents and to maintain its symbol table. The structure of the ar command is 
a little different from the normal UNIX system arrangement of command 
line options. When you enter the ar command you include a one-character 
key from the set drqtpmx that defines the type of action you intend. The 
key may be combined with one or more additional characters from the set 
vuaibcls that modify the way the requested operation is performed. The 
makeup of the command line is 

ar -key [posname] a file [name]... 

where posname is the name of a member of the archive and may be used 
with some optional key characters to make sure that the files in your 
archive are in a particular order. The afile argument is the name of your 
archive file. By convention, the suffix .a is used to indicate the named file is 
an archive file, (libc.a, for example, is the archive file that contains many of 
the object files of the standard C subroutines.) One or more names may be 
furnished. These identify files that are subjected to the action specified in 
the key. 

We can make an archive file to contain the modules used in our sample 
program, restate. The command to do this is 

ar -rv rste.a restate.o oppty.o pft.o rfe.o 

If these are the only ,o files in the current directory, you can use shell 
metacharacters as follows: 

ar — rv rste.a *.o 
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Either command will produce this feedback: 



a - r estate. o 
a - oppty.o 
a - pft.o 
a - rfe.o 

ar: creating rste.a 



The nm command is used to get a variety of information from the sym- 
bol table of common object files. The object files can be, but don't have to 
be, in an archive file. Figure 2-25 shows the output of this command when 
executed with the -f (for full) option on the archive we just created. The 
object files were compiled with the — g option. 
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Symbols from rste.a[restate.o] 



Name 


Value 


Class 


Tvnp 


Size 


T inp 


.Ofake 






strtag 


struct 


16 


restate.c 




file 








cnt 





strmem 


int 






Dtr 


4 




*Uchar 






base 


8 


StTTYlPTn 


*Urhar 








12 


strmPTTi 


V-XlClA 






file 


13 


cf-rrnpm 

iSLX IXLCXXL 


char 






.eos 




endstr 




16 




rec 




strtag 


struct 


52 




pname 





strmpTTi 

iSLX XXICXXI. 


rharr251 








28 


OIL XllCTxlL 








dp 


32 


strmein 


float 






i 


36 


strmem 


float 






c 


40 


strmem 


float 






t 


44 










spx 


48 


strmpm 

■J^X XXl\«XXl 


float 






.eos 




endstr 




52 




main 





extern 


int( ) 

XX 11,^ / 


520 




.bf 


10 


fen 






11 


argc 





argm't 


int 






argv 


4 


argm't 


**char 






fin 





auto 


*struct-.Ofake 


16 




ofiag 


4 


auto 


int 






pfiag 


8 


auto 


int 






rflag 


12 


auto 


int 






ch 


16 


auto 


int 







Figure 2-25: nm Output, with -f Option 
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Symbols from rste-a[restate.o] 



Name 


Value 


Class 


TvDe 


Size 


Line 


first 


20 


auto 


struct-rec 


52 




.ef 


518 


fen 






61 


FILE 




typdef 


struct-. Of ake 


16 




.text 





static 




31 


39 


.data 


520 


static 






4 


.bss 


824 


static 








iob 





extern 








fprintf 





extern 








exit 





extern 








opterr 





extern 








getopt 





extern 








fopen 





extern 








fscanf 





extern 








printf 





extern 








oppty 





extern 








pft 





extern 








rfe 





extern 









Figure 2-25: nm Output, with — f Option (continued) 
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Symbols from rste.a[oppty.o] 



Name 


Value 


Class 


Tvoe 


Size 


Line 




ODDtV C 




file 










rec 




strtag 


struct 


52 






pname 







charr251 


25 






PPX 


28 


strmem 


float 








dp 


32 


strmem 


float 








i 


36 


strmem 


float 








c 


40 


strmem 


float 








t 


44 


strmem 


float 








spx 


48 


strmem 


float 








,eos 




endstr 




52 






oppty 





extern 


floatO 


64 




.text 


.bf 


10 


fen 






7 


.text 


ps 





argm't 


*struct-rec 


52 






.ef 


62 


fen 






3 


.text 


.text 





static 




4 


1 


.text 


.data 


64 


static 








.data 


.bss 


72 


static 








.bss 



Figure 2-25: nm Output, with -f Option (continued) 
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Symbols from rste.a[pft.o] 



Name 


Value 


Class 


Type 


Size 


Line 


Sectic 


pft.c 




file 










rec 




strtag 


struct 


52 






pname 





strmem 


char[25] 


25 






ppx 


28 


strmem 


float 








dp 


32 


strmem 


float 








i 


36 


strmem 


float 








c 


40 


strmem 


float 








t 


44 


strmem 


float 








spx 


48 


strmem 


float 








..eos 




endstr 




52 






pft 





extern 


floatO 


60 




.text 


,.bf 


10 


fen 






7 


.text 


ps 





argm't 


*struct-rec 


52 






..ef 


58 


fen 






3 


.text 


..text 





static 




4 




.text 


..data 


60 


static 








.data 


..bss 


60 


static 








.bss 



Figure 2-25: nm Output, with -f Option (continued) 
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Symbols from rste.a[rfe.o] 



Name 


Value 


Class 


Type 


Size 


Line 


Section 


rfe.c 




file 










rec 




strtag 


struct 


52 






pname 





strmem 


char[25] 


25 






ppx 


28 


strmem 


float 








dp 


32 


strmem 


float 








i 


36 


strmem 


float 








c 


40 


strmem 


float 








t 


44 


strmem 


float 








spx 


48 


strmem 


float 








.eos 




endstr 




52 






rfe 





extern 


floatO 


68 




.text 


.bf 


10 


fen 






8 


.text 


ps 





argm't 


*struet-rec 


52 






.ef 


64 


fen 






3 


.text 


.text 





statie 




4 


1 


.text 


.data 


68 


static 








.data 


.bss 


76 


statie 








.bss 



Figure 2-25: nm Output, with -f Option (continued) 



For nm to work on an archive file all of the contents of the archive have 
to be object modules. If you have stored other things in the archive, you 
will get the message: 

ran: rste.a bad magic 

when you try to execute the command. 
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Use of sees by Single-User Programmers 

The UNIX system Source Code Control System (SCCS) is a set of pro- 
grams designed to keep track of different versions of programs. When a 
program has been placed under control of SCCS, only a single copy of any 
one version of the code can be retrieved for editing at a given time. When 
program code is changed and the program returned to SCCS, only the 
changes are recorded. Each version of the code is identified by its SID, or 
SCCS IDentifying number. By specifying the SID when the code is 
extracted from the SCCS file, it is possible to return to an earlier version. If 
an early version is extracted with the intent of editing it and returning it to 
SCCS, a new branch of the development tree is started. The set of programs 
that make up SCCS appear as UNIX system commands. The commands are: 

admin 

get 

delta 

prs 

rmdel 

cdc 

what 

sccsdiff 

comb 

val 

It is most common to think of SCCS as a tool for project control of large 
programming projects. It is, however, entirely possible for any individual 
user of the UNIX system to set up a private SCCS system. Chapter 14 is an 
SCCS user's guide. 
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Introduction 



This chapter deals with programming where the objective is to produce 
sets of programs (applications) that will run on a UNIX system computer. 

The chapter begins with a discussion of how the ground rules change as 
you move up the scale from writing programs that are essentially for your 
own private use (we have called this single-user programming), to working 
as a member of a programming team developing an application that is to be 
turned over to others to use. 

There is a section on how the criteria for selecting appropriate program- 
ming languages may be influenced by the requirements of the application. 

The next three sections of the chapter deal with a number of loosely- 
related topics that are of importance to programmers working in the appli- 
cation development environment. Most of these mirror topics that were 
discussed in Chapter 2, Programming Basics, but here we try to point out 
aspects of the subject that are particularly pertinent to application program- 
ming. They are covered under the following headings: 

Advanced Programming deals with such topics as File and Record Lock- 
ing, Interprocess Communication, and program- 
ming terminal screens. 

Support Tools covers the Common Object File Format, link edi- 

tor directives, shared libraries, SDB, and lint. 

Project Control Tools includes some discussion of make and SCCS. 

The chapter concludes with a description of a sample application called 
liber that uses several of the components described in earlier portions of the 
chapter. 



APPLICATION PROGRAMMING 3-1 



Application Programming 



The characteristics of the application programming environment that 
make it different from single-user programming have at their base the need 
for interaction and for sharing of information. 

Numbers 

Perhaps the most obvious difference between application programming 
and single-user programming is in the quantities of the components. Not 
only are applications generally developed by teams of programmers, but the 
number of separate modules of code can grow into the hundreds on even a 
fairly simple application. 

When more than one programmer works on a project, there is a need to 
share such information as: 

■ the operation of each function 

■ the number, identity and type of arguments expected by a function 

■ if pointers are passed to a function, are the objects being pointed to 
modified by the called function, and what is the lifetime of the 
pointed-to object 

■ the data type returned by a function 

In an application, there is an odds-on possibility that the same function 
can be used in many different programs, by many different programmers. 
The object code needs to be kept in a library accessible to anyone on the 
project who needs it. 



Portability 

When you are working on a program to be used on a single model of a 
computer, your concerns about portability are minimal. In application 
development, on the other hand, a desirable objective often is to produce 
code that will run on many different UNIX system computers. Some of the 
things that affect portability will be touched on later in this chapter. 
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Documentation 

A single-user program has modest needs for documentation. There 
should be enough to remind the program's creator how to use it, and what 
the intent was in portions of the code. 

On an application development project there is a significant need for 
two types of internal documentation: 

■ comments throughout the source code that enable successor program- 
mers to understand easily what is happening in the code. Applica- 
tions can be expected to have a useful life of 5 or more years, and fre- 
quently need to be modified during that time. It is not realistic to 
expect that the same person who wrote the program will always be 
available to make modifications. Even if that does happen the com- 
ments will make the maintenance job a lot easier. 

■ hard-copy descriptions of functions should be available to all 
members of an application development team. Without them it is 
difficult to keep track of available modules, which can result in the 
same function being written over again. 

Unless end-users have clear, readily-available instructions in how to 
install and use an application they either will not do it at all (if that is an 
option), or do it improperly. 

The software industry has become ever more keenly aware of the 
importance of good end-user documentation. There are cases on record 
where the success of a software package has been attributed in large part to 
the fact that it had exceptionally good documentation. There are also cases 
where a pretty good piece of software was not widely used due to the inac- 
cessibility of its manuals. There appears to be no truth to the rumor that in 
one or two cases, end-users have thrown the software away and just read 
the manual. 
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Project Management 

Without efifective project management, an application development pro- 
ject is in trouble. This subject will not be dealt with in this guide, except to 
mention the following three things that are vital functions of project 
management: 

■ tracking dependencies between modules of code 

■ dealing with change requests in a controlled way 

■ seeing that milestone dates are met 
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In this section we talk about some of the considerations that influence 
the selection of programming languages, and describe two of the special 
purpose languages that are part of the UNIX system environment. 



Influences 

In single-user programming the choice of language is often a matter of 
personal preference; a language is chosen because it is the one the program- 
mer feels most comfortable with. 

An additional set of considerations comes into play when making the 
same decision for an application development project. 

1 . Is there an existing standard within the organization that should be 
observed? 

A firm may decide to emphasize one language because a good 
supply of programmers is available who are familiar with it. 

2. Does one language have better facilities for handling the particular 
algorithm? 

One would like to see all language selection based on such objec- 
tive criteria, but it is often necessary to balance this against the 
skills of the organization. 

3. Is there an inherent compatibility between the language and the 
UNIX operating system? 

This is sometimes the impetus behind selecting C for programs 
destined for a UNIX system machine. 

4. Are there existing tools that can be used? 

If parsing of input lines is an important phase of the application, 
perhaps a parser generator such as yacc should be employed to 
develop what the application needs. 
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5. Does the application integrate other software into the whole pack- 
age? 

If, for example, a package is to be built around an existing data 
base management system, there may be constraints on the variety 
of languages the data base management system can accommodate. 



Special Purpose Languages 

The UNIX system contains a number of tools that can be included in the 
category of special purpose languages. Three that are especially interesting 
are awk, lex, and yacc. 

What awk Is Like 

The awk utility scans an ASCII input file record by record, looking for 
matches to specific patterns. When a match is found, an action is taken. 
Patterns and their accompanying actions are contained in a specification file 
referred to as the program. The program can be made up of a number of 
statements. However, since each statement has the potential for causing a 
complex action, most awk programs consist of only a few. The set of state- 
ments may include definitions of the pattern that separates one record from 
another (a newline character, for example), and what separates one field of a 
record from the next (white space, for example). It may also include actions 
to be performed before the first record of the input file is read, and other 
actions to be performed after the final record has been read. All statements 
in between are evaluated in order for each record in the input file. To para- 
phrase the action of a simple awk program, it would go something like this: 

Look through the input file. 

Every time you see this specific pattern, do this action. 

A more complex awk program might be paraphrased like this: 

First do some initialization. 

Then, look through the input file. 

Every time you see this specific pattern, do this action. 

Every time you see this other pattern, do another action. 

After all the records have been read, do these final things. 
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The directions for finding the patterns and for describing the actions 
can get pretty complicated, but the essential idea is as simple as the two sets 
of statements above. 

One of the strong points of awk is that once you are familiar with the 
language syntax, programs can be written very quickly. They don't always 
run very fast, however, so they are seldom appropriate if you want to run 
the same program repeatedly on a large quantities of records. In such a 
case, it is likely to be better to translate the program to a compiled 
language. 

How awk Is Used 

One typical use of awk would be to extract information from a file and 
print it out in a report. Another might be to pull fields from records in an 
input file, arrange them in a different order and pass the resulting rear- 
ranged data to a function that adds records to your data base. There is an 
example of a use of awk in the sample application at the end of this 
chapter. 

Where to Find More Information 

The manual page for awk is in Section (1) of the User's Reference Manual 
Chapter 4 in Part 2 of this guide contains a description of the awk syntax 
and a number of examples showing ways in which awk may be used. 

What lex and yacc Are Like 

lex and yacc are often mentioned in the same breath because they per- 
form complementary parts of what can be viewed as a single task: making 
sense out of input. The two utilities also share the common characteristic of 
producing source code for C language subroutines from specifications that 
appear on the surface to be quite similar. 

Recognizing input is a recurring problem in programming. Input can 
be from various sources. In a language compiler, for example, the input is 
normally contained in a file of source language statements. The UNIX sys- 
tem shell language most often receives its input from a person keying in 
commands from a terminal. Frequently, information coming out of one pro- 
gram is fed into another where it must be evaluated. 
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The process of input recognition can be subdivided into two tasks: lexi- 
cal analysis and parsing, and that's where lex and yacc come in. In both 
utilities, the specifications cause the generation of C language subroutines 
that deal with streams of characters; lex generates subroutines that do lexi- 
cal analysis while yacc generates subroutines that do parsing. 

To describe those two tasks in dictionary terms: 

Lexical analysis has to do with identifying the words or vocabu- 
lary of a language as distinguished from its grammar or struc- 
ture. 

Parsing is the act of describing units of the language grammati- 
cally. Students in elementary school are often taught to do this 
with sentence diagrams. 

Of course, the important thing to remember here is that in each case the 
rules for our lexical analysis or parsing are those we set down ourselves in 
the lex or yacc specifications. Because of this, the dividing line between 
lexical analysis and parsing sometimes becomes fuzzy. 

The fact that lex and yacc produce C language source code means that 
these parts of what may be a large programming project can be separately 
maintained. The generated source code is processed by the C compiler to 
produce an object file. The object file can be link edited with others to pro- 
duce programs that then perform whatever process follows from the recog- 
nition of the input. 

How lex Is Used 

A lex subroutine scans a stream of input characters and waves a flag 
each time it identifies something that matches one or another of its rules. 
The waved flag is referred to as a token. The rules are stated in a format 
that closely resembles the one used by the UNIX system text editor for reg- 
ular expressions. For example, 

[ \t]+ 

describes a rule that recognizes a string of one or more blanks or tabs 
(without mentioning any action to be taken). A more complete statement of 
that rule might have this notation: 

[ \t]+ ; 

which, in effect, says to ignore white space. It carries this meaning because 
no action is specified when a string of one or more blanks or tabs is 
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recognized. The semicolon marks the end of the statement. Another rule, 
one that does take some action, could be stated like this: 

[0-9]+ { 

i = atoKyytext) ; 
retum(NBaR) ; 
} 

This rule depends on several things: 

NBR must have been defined as a token in an earlier part of the 
lex source code called the declaration section. (It may be in a 
header file which is #include'd in the declaration section.) 

i is declared as an extern int in the declaration section. 

It is a characteristic of lex that things it finds are made available 
in a character string called yytext. 

Actions can make use of standard C syntax. Here, the standard 
C subroutine, atoi, is used to convert the string to an integer. 

What this rule boils down to is lex saying, "Hey, I found the kind of 
token we call NBR, and its value is now in i." 

To review the steps of the process: 

1 . The lex specification statements are processed by the lex utility to 
produce a file called lex.yy«c. (This is the standard name for a file 
generated by lex, just as a,out is the standard name for the execut- 
able file generated by the link editor.) 

2. lex.yy.c is transformed by the C compiler (with a — c option) into an 
object file called lex.yy.o that contains a subroutine called yylex(). 

3. lex.yy.o is link edited with other subroutines. Presumably one of 
those subroutines will call yylexO with a statement such as: 

v*iile( (token = yylexO) 1= 0) 

and other subroutines (or even main) will deal with what comes 
back. 
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Where to Find More Information 

The manual page for lex is in Section (1) of the Programmer's Reference 
Manual. A tutorial on lex is contained in Chapter 5 in Part 2 of this guide. 

How yacc Is Used 

yacc subroutines are produced by pretty much the same series of steps 
as lex: 

1 . The yacc specification is processed by the yacc utility to produce a 
file called y.tabx. 

2. y.tab.c is compiled by the C compiler producing an object file, 
y.tab.Ox that contains the subroutine yyparseO. A significant 
difference is that yyparseO calls a subroutine called yylexO to per- 
form lexical analysis. 

3. The object file y.tab.o may be link edited with other subroutines, 
one of which will be called yylex(). 

There are two things worth noting about this sequence: 

1 . The parser generated by the yacc specifications calls a lexical 
analyzer to scan the input stream and return tokens. 

2. While the lexical analyzer is called by the same name as one pro- 
duced by lex, it does not have to be the product of a lex 
specification. It can be any subroutine that does the lexical analysis. 

What really differentiates these two utilities is the format for their rules. 
As noted above, lex rules are regular expressions like those used by UNIX 
system editors, yacc rules are chains of definitions and alternative 
definitions, written in Backus-Naur form, accompanied by actions. The 
rules may refer to other rules defined further down the specification. 
Actions are sequences of C language statements enclosed in braces. They 
frequently contain numbered variables that enable you to reference values 
associated with parts of the rules. An example might make that easier to 
understand: 
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Xtoken NUMBER 
eofpr 



numb 



numb { $$ = $1; } 

expr expr { $$ = $i + $3; } 

expr expr { $$ = $i - $3; } 

expr expr { $$ = $i # $3; } 

expr expr { $$ « $1 / $3; } 

eaqpc { $$ = $2; } 



NIMBER { $$ = $1; } 



This fragment of a yacc specification shows 

■ NUMBER identified as a token in the declaration section 

■ the start of the rules section indicated by the pair of percent signs 

■ a number of alternate definitions for expr separated by the | sign and 
terminated by the semicolon 

■ actions to be taken when a rule is matched 

■ within actions, numbered variables used to represent components of 
the rule: 

$$ means the value to be returned as the value of the whole rule 

$« means the value associated with the nth component of the rule, 
counting from the left 

■ numb defined as meaning the token NUMBER. This is a trivial exam- 
ple that illustrates that one rule can be referenced within another, as 
well as within itself. 

As with lex, the compiled yacc object file will generally be link edited with 
other subroutines that handle processing that takes place after the 
parsing — or even ahead of it. 
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Where to Find More Information 

The manual page for yacc is in Section (1) of the Programmer's Reference 
Manual A detailed description of yacc may be found in Chapter 6 of this 
guide. 
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In Chapter 2 we described the use of such basic elements of program- 
ming in the UNIX system environment as the standard I/O library, header 
files, system calls and subroutines. In this section we introduce tools that 
are more apt to be used by members of an application development team 
than by a single-user programmer. The section contains material on the fol- 
lowing topics: 

■ memory management 

■ file and record locking 

■ interprocess communication 

■ programming terminal screens 



Memory Managemneint 

There are situations where a program needs to ask the operating system 
for blocks of memory. It may be, for example, that a number of records 
have been extracted from a data base and need to be held for some further 
processing. Rather than writing them out to a file on secondary storage and 
then reading them back in again, it is likely to be a great deal more efficient 
to hold them in memory for the duration of the process. (This is not to 
ignore the possibility that portions of memory may be paged out before the 
program is finished; but such an occurrence is not pertinent to this discus- 
sion.) There are two C language subroutines available for acquiring blocks 
of memory and they are both called malice. One of them is malloc(3C), the 
other is malloc(3X). Each has several related commands that do specialized 
tasks in the same area. They are: 

■ free — to inform the system that space is being relinquished 

■ realloc— to change the size and possibly move the block 

H calloc — to allocate space for an array and initialize it to zeros 

In addition, malloc(3X) has a function, mallopt, that provides for con- 
trol over the space allocation algorithm, and a structure, mallinfo, from 
which the program can get information about the usage of the allocated 
space. 
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malloc(3X) runs faster than the other version. It is loaded by specifying 
— Imalloc 

on the cc(l) or ld(l) command line to direct the link editor to the proper 
library. When you use inalloc(3X) your program should contain the state- 
ment 

#includfi <inalloc,h> 

where the values for mallopt options are defined. 

See the Programmer's Reference Manual for the formal definitions of the 
two malices. 



File and Record Locking 

The provision for locking files, or portions of files, is primarily used to 
prevent the sort of error that can occur when two or more users of a file try 
to update information at the same time. The classic example is the airlines 
reservation system where two ticket agents each assign a passenger to Seat 
A, Row 5 on the 5 o'clock flight to Detroit. A locking mechanism is 
designed to prevent such mishaps by blocking Agent B from even seeing 
the seat assignment file until Agent A's transaction is complete. 

File locking and record locking are really the same thing, except that 
file locking implies the whole file is affected; record locking means that 
only a specified portion of the file is locked, (Remember, in the UNIX sys- 
tem, file structure is undefined; a record is a concept of the programs that 
use the file.) 

Two types of locks are available: read locks and write locks. If a process 
places a read lock on a file, other processes can also read the file but all are 
prevented from writing to it, that is, changing any of the data. If a process 
places a write lock on a file, no other processes can read or write in the file 
until the lock is removed. Write locks are also known as exclusive locks. 
The term shared lock is sometimes applied to read locks. 

Another distinction needs to be made between mandatory and advisory 
locking. Mandatory locking means that the discipline is enforced automati- 
cally for the system calls that read, write or create files. This is done 
through a permission flag established by the file's owner (or the super-user). 
Advisory locking means that the processes that use the file take the respon- 
sibility for setting and removing locks as needed. Thus mandatory may 
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sound like a simpler and better deal, but it isn't so. The mandatory locking 
capability is included in the system to comply with an agreement with 
/usr/ group, an organization that represents the interests of UNIX system 
users. The principal weakness in the mandatory method is that the lock is 
in place only while the single system call is being made. It is extremely 
common for a single transaction to require a series of reads and writes 
before it can be considered complete. In cases like this, the term atomic is 
used to describe a transaction that must be viewed as an indivisible unit. 
The preferred way to manage locking in such a circumstance is to make cer- 
tain the lock is in place before any I/O starts, and that it is not removed 
until the transaction is done. That calls for locking of the advisory variety. 

How File and Record Locking Works 

The system call for file and record locking is fcntl(2). Programs should 
include the line 

#iiicl\:de <fcntl.h> 

to bring in the header file shown in Figure 3-1. 
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/* Flag values accessible to open(2) and fciitl(2) 
/♦ (The first three can only be set ty open) ♦/ 



#define 
#define 
#defijie 
#defiiie 
#de£ine 
#de£iiie 



OREONLY 
0_WRONLY 
0~RDWR 
O NDELAY 
OAPPEND 
O SYNC 





1 

2 

04 

010 

020 



/* Non-blocking I/O ♦/ 

/* append (writes guaranteed at the end) */ 
/* synchrcnous write option ♦/ 



/♦ Flag values accessible only to open{2) ♦/ 



#de£ine 


0_CREAT 


00400 


/♦ open with file create (uses third open 


#de£ine 


O^TOUNC 


01000 


/« open with truncation ♦/ 


#define 


EXCL 


02000 


/« exclusive open */ 


/* fcntl(2) requests «/ 




#defijie 


F^DUPED 





/♦ Explicate f ildes */ 


#define 


F^GETE© 


1 


/♦ Get fildes flags */ 


#define 


F SEIED 


2 


/♦ Set fildes flags */ 


#de£ine 


F_GE7rFL 


3 


/♦ Get file flags ♦/ 


#define 


F^SEan. 


4 


/* Set file flags */ 


#de£iiie 




5 


/* Get file lode ♦/ 


#de£ine 


F SETTUC 


6 


/* Set file lock ♦/ 


#define 


F SETHiCW 


7 


/* Set file lock and wait */ 


#define 


F_C3iKFL 


8 


/» Check legality of file flag changes ♦/ 



short 


l.type; 


short 


1 whence 


long 


1 start; 


long 


lien; 


short 


1 sjrsid; 


short 


Ipid; 



/* file segment locking set data type - information passed to system by \iser */ 
struct flock { 



/« len = means until end of file */ 



}; 

A file segment locking types */ 

/* Read lock »/ 
#define FRDLCK 01 

/♦ Write lock */ 
#define F_WRLCK 02 

/♦ Penove lock(s) ♦/ 
#de£ine F UNLCK 03 



Figure 3-1: The fcntl.h Header File 
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The format of the f cntl(2) system call is 

int fcntKfildes, and, arg) 
int fildes, and, arg; 

fildes is the file descriptor returned by the open system call. In addition to 
defining tags that are used as the commands on f cntl system calls, f cntl.h 
includes the declaration for a struct flock that is used to pass values that con- 
trol where locks are to be placed. 

lockf 

A subroutine, lockf(3), can also be used to lock sections of a file or an 
entire file. The format of lockf is: 

#incli3de <unistd.h> 

int lockf (fildes, function, size) 
int fildes, function; 
long size; 

fildes is the file descriptor; function is one of four control values defined in 
unistd.h that let you lock, unlock, test and lock, or simply test to see if a 
lock is already in place, size is the number of contiguous bytes to be locked 
or unlocked. The section of contiguous bytes can be either forward or back- 
ward from the current ofiset in the file. (You can arrange to be somewhere 
in the middle of the file by using the lseek(2) system call.) 

Where to Find More Information 

There is an example of file and record locking in the sample application 
at the end of this chapter. The manual pages that apply to this facility are 
fcntl(2), fcntl(5), lockf(3), and chmod(2) in the Programmers Reference 
Manual Chapter 7 in Part 2 of this guide is a detailed discussion of the sub- 
ject with a number of examples. 



Interprocess CommuffnicatioiDS 

In Chapter 2 we described forking and execing as methods of communi- 
cating between processes. Business applications running on a UNIX system 
computer often need more sophisticated methods. In applications, for 
example, where fast response is critical, a number of processes may be 
brought up at the start of a business day to be constantly available to handle 
transactions on demand. This cuts out initialization time that can add 
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seconds to the time required to deal with the transaction. To go back to the 
ticket reservation example again for a moment, if a customer calls to reserve 
a seat on the 5 o'clock flight to Detroit, you don't want to have to say, "Yes, 
sir. Just hang on a minute while I start up the reservations program." In 
transaction driven systems, the normal mode of processing is to have all the 
components of the application standing by waiting for some sort of an indi- 
cation that there is work to do. 

To meet requirements of this type the UNIX system offers a set of nine 
system calls and their accompanying header files, all under the umbrella 
name of Interprocess Communications (IPC). 

The IPC system calls come in sets of three; one set each for messages, 
semaphores, and shared memory. These three terms define three different 
styles of communication between processes: 

messages communication is in the form of data stored in a buffer. 

The buffer can be either sent or received. 

semaphores communication is in the form of positive integers with 
a value between and 32,767. Semaphores may be con- 
tained in an array the size of which is determined by 
the system administrator. The default maximum size 
for the array is 25. 

shared memory communication takes place through a common area of 
main memory. One or more processes can attach a seg- 
ment of memory and as a consequence can share what- 
ever data is placed there. 

The sets of IPC system calls are: 

msgget semget shmget 
msgctl semctl shmctl 
msgop semop shmop 



IPC get Calls 

The get calls each return to the calling program an identifier for the 
type of IPC facility that is being requested. 
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IPC ctl Calls 

The ctl calls provide a variety of control operations that include obtain- 
ing (IPC_STAT), setting (IPC_SET) and removing (IPC_RMID), the values in 
data structures associated with the identifiers picked up by the get calls. 

IPC op Calls 

The op manual pages describe calls that are used to perform the particu- 
lar operations characteristic of the type of IPC facility being used, msgop 
has calls that send or receive messages, semop (the only one of the three 
that is actually the name of a system call) is used to increment or decrement 
the value of a semaphore, among other functions, shmop has calls that 
attach or detach shared memory segments. 

Where to Find More Information 

An example of the use of some IPC features is included in the sample 
application at the end of this chapter. The system calls are all located in 
Section (2) of the Programmer's Reference Manual. Don't overlook 
intro(2). It includes descriptions of the data structures that are used by IPC 
facilities. A detailed description of IPC, with many code examples that use 
the IPC system calls, is contained in Chapter 9 in Part 2 of this guide. 



Programming Terminal Screens 

The facility for setting up terminal screens to meet the needs of your 
application is provided by two parts of the UNIX system. The first of these, 
terminfo, is a data base of compiled entries that describe the capabilities of 
terminals and the way they perform various operations. 

The terminfo data base normally begins at the directory 
/usr/lib/terminfo. The members of this directory are themselves direc- 
tories, generally with single-character names that are the first character in 
the name of the terminal. The compiled files of operating characteristics are 
at the next level down the hierarchy. For example, the entry for a Teletype 
5425 is located in both the file /usr/lib/terminfo/5/5425 and the file 
/usr / lib / terminfo / 1 / tty 5425. 
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Describing the capabilities of a terminal can be a painstaking task. 
Quite a good selection of terminal entries is included in the terminfo data 
base that comes with your 3B Computer. However, if you have a type of 
terminal that is not already described in the data base, the best way to 
proceed is to find a description of one that comes close to having the same 
capabilities as yours and building on that one. There is a routine (setup- 
term) in curses(3X) that can be used to print out descriptions from the data 
base. Once you have worked out the code that describes the capabilities of 
your terminal, the tic(lM) command is used to compile the entry and add it 
to the data base. 

curses 

After you have made sure that the operating capabilities of your termi- 
nal are a part of the terminfo data base, you can then proceed to use the 
routines that make up the curses(3X) package to create and manage screens 
for your application. 

The curses library includes functions to: 

■ define portions of your terminal screen as windows 

■ define pads that extend beyond the borders of your physical terminal 
screen and let you see portions of the pad on your terminal 

■ read input from a terminal screen into a program 

■ write output from a program to your terminal screen 

■ manipulate the information in a window in a virtual screen area and 
then send it to your physical screen 

Where to Find More Infornnation 

In the sample application at the end of this chapter, we show how you 
might use curses routines. Chapter 10 in Part 2 of this guide contains a 
tutorial on the subject. The manual pages for curses are in Section (3X), and 
those for terminfo are in Section (4) of the Programmer s Reference Manual 
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This section covers UNIX system components that are part of the pro- 
gramming environment, but that have a highly specialized use. We refer to 
such things as: 

■ link edit command language 

■ Common Object File Format 

■ libraries 

■ Symbolic Debugger 

■ lint as a portability tool 



Link Edit Command Language 

The link editor command language is for use when the default arrange- 
ment of the Id output will not do the job. The default locations for the 
standard Common Object File Format sections are described in a.out(4) in 
the Programmer's Reference Manual On a 3B2 Computer, when an a.out file 
is loaded into memory for execution, the text segment starts at location 
0x80800000, and the data section starts at the next segment boundary after 
the end of the text. The stack begins at OxC0020000 and grows to higher 
memory addresses. 

The link editor command language provides directives for describing 
different arrangements. The two major types of link editor directives are 
MEMORY and SECTIONS. MEMORY directives can be used to define the 
boundaries of configured and unconfigured sections of memory within a 
machine, to name sections, and to assign specific attributes (read, write, exe- 
cute, and initialize) to portions of memory. SECTIONS directives, among a 
lot of other functions, can be used to bind sections of the object file to 
specific addresses within the configured portions of memory. 

Why would you want to be able to do those things? Well, the truth is 
that in the majority of cases you don't have to worry about it. The need to 
control the link editor output becomes more urgent under two, possibly 
related, sets of circumstances. 
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1 . Your application is large and consists of a lot of object files. 

2. The hardware your application is to run on is tight for space. 

Where to Find More Information 

Chapter 12 in Part 2 of this guide gives a detailed description of the 
subject. 



Common Object File Format 

The details of the Common Object File Format have never been looked 
on as stimulating reading. In fact, they have been recommended to hard- 
core insomniacs as preferred bedtime fare. However, if you're going to 
break into the ranks of really sophisticated UNIX system programmers, 
you're going to have to get a good grasp of COFF. A knowledge of COFF is 
fundamental to using the link editor command language. It is also good 
background knowledge for tasks such as: 

■ setting up archive libraries or shared libraries 

■ using the Symbolic Debugger 

The following system header files contain definitions of data structures 
of parts of the Common Object File Format: 

<syms.h> symbol table format 

<linenum.h> line number entries 

<ldfcn.h> COFF access routines 

<filehdr.h> file header for a common object file 

<a.out.h> common assembler and link editor output 

<scnhdr.h> section header for a common object file 

<reloc,h> relocation information for a common object file 

<storcIas$.h> storage classes for common object files 



The object file access routines are described below under the heading 
"The Object File Library." 
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Where to Find More Information 

Chapter 11 in Part 2 of this guide gives a detailed description of COFF. 



Libraries 

A library is a collection of related object files and /or declarations that 
simplify programming effort. Programming groups involved in the 
development of applications often find it convenient to establish private 
libraries. For example, an application with a number of programs using a 
common data base can keep the I/O routines in a library that is searched at 
link edit time. 

" The archive libraries are collections of common object format files stored 
in an archive {filertame.a) file that is searched by the link editor to resolve 
references. Files in the archive that are needed to satisfy unresolved refer- 
ences become a part of the resulting executable. 

Shared libraries are similar to archive libraries in that they are collec- 
tions of object files that are acted upon by the link editor. The difference, 
however, is that shared libraries perform a static linking between the file in 
the library and the executable that is the output of Id. The result is a saving 
of space, because all executables that need a file from the library share a sin- 
gle copy. We go into shared libraries later in this section. 

In Chapter 2 we described many of the functions that are found in the 
standard C library, libc.a. The next two sections describe two other 
libraries, the object file library and the math library. 

The Object File Library 

The object file library provides functions for the access and manipula- 
tion of object files. Some functions locate portions of an object file such as 
the symbol table, the file header, sections, and line number entries associ- 
ated with a function. Other functions read these types of entries into 
memory. The need to work at this level of detail with object files occurs 
most often in the development of new tools that manipulate object files. 
For a description of the format of an object file, see "The Common Object 
File Format" in Chapter 11. This library consists of several portions. The 
functions reside in /lib/libld.a and are loaded during the compilation of a 
C language program by the —1 command line option: 
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cc file —lid 

which causes the link editor to search the object file library. The argument 
-lid must appear after all files that reference functions in libld.a. 

The following header files must be included in the source code. 

#include <stdio.h> 
#incl\ide <a,out.h> 
#include <ldfcn,h> 



Function 


Reference 


Brief Description 


Idaclose 


ldcIose(3X) 


Close object file being processed. 


Idahread 


ldahread(3X) 


Read archive header. 


Idaopen 


ldopen(3X) 


Open object file for reading. 


Idclose 


ldclose(3X) 


Close object file being processed. 


Idfhread 


ldfhread(3X) 


Read file header of object file being 
processed. 


Idgetname 


ldgetname(3X) 


Retrieve the name of an object file 
symbol table entry. 


Idlinit 


ldlread(3X) 


Prepare object file for reading line 
number entries via Idlitem. 


Idlitem 


ldlread(3X) 


Read line number entry from object 
file after Idlinit. 


Idlread 


ldlread(3X) 


Read line number entry from object 
file. 


Idlseek 


ldlseek(3X) 


Seeks to the line number entries of the 
object file being processed. 


Idnlseek 


ldlseek(3X) 


Seeks to the line number entries of the 
object file being processed given the 
name of a section. 
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PiinpHnn 


Reference 


Brief Description 


Idnrseek 


ldrseek(3X) 


Seeks to the relocation entries of the 
object file being processed given the 
name of a section. 


Idnshread 


ldshread(3X) 


Read section header of the named sec- 
tion of the object file being processed. 


Idnsseek 


ldsseek(3X) 


Seeks to the section of the object file 
being processed given the name of a 
section. 


Idohseek 


ldohseek(3X) 


Seeks to the optional file header of the 
object file being processed. 


Idopen 


ldopen(3X) 


Open object file for reading. 


Idrseek 


ldrseek(3X) 


Seeks to the relocation entries of the 
object file being processed. 


Idshread 


ldshread(3X) 


Read section header of an object hie 
being processed. 


Idsseek 


ldsseek(3X) 


Seeks to the section of the object file 
being processed. 


Idtbindex 


ldtbindex(3X) 


Returns the long index of the symbol 
table entry at the current position of 
the object file being processed. 


Idtbread 


ldtbread(3X) 


Reads a specific symbol table entry of 
the object file being processed. 


Idtbseek 


ldtbseek(3X) 


Seeks to the symbol table of the object 
file being processed. 


sgetl 


sputl(3X) 


Access long integer data in a machine 
independent format. 


sputl 


sputl(3X) 


Translate a long integer into a 
machine independent format. 
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Common Object File Interface Macros (Idfcn.h) 

The interface between the calling program and the object file access rou- 
tines is based on the defined type LDFILE, which is in the header file 
Idfcn.h (see ldfcn(4)). The primary purpose of this structure is to provide 
uniform access to both simple object files and to object files that are 
members of an archive file. 

The function ldopen(3X) allocates and initializes the LDFILE structure 
and returns a pointer to the structure. The fields of the LDFILE structure 
may be accessed individually through the following macros: 

■ The TYPE macro returns the magic number of the file, which is used 
to distinguish between archive files and object files that are not part 
of an archive. 

■ The lOPTR macro returns the file pointer, which was opened by 
ldopen(3X) and is used by the input/output functions of the C 
library. 

■ The OFFSET macro returns the file address of the beginning of the 
object file. This value is non-zero only if the object file is a member 
of the archive file. 

■ The HEADER macro accesses the file header structure of the object 
file. ^ 

Additional macros are provided to access an object file. These macros 
parallel the input/output functions in the C library; each macro translates a 
reference to an LDFILE structure into a reference to its file descriptor field. 
The available macros are described in ldfcn(4) in the Programmers Reference 
Manual. 

The Math Library 

The math library package consists of functions and a header file. The 
functions are located and loaded during the compilation of a C language 
program by the -1 option on a command line, as follows: 

cc file — Im 
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This option causes the link editor to search the math library, libm.a. In 
addition to the request to load the functions, the header file of the math 
library should be included in the program being compiled. This is accom- 
plished by including the line: 

#iiiclude <inath.h> 

near the beginning of each file that uses the routines. 

The functions are grouped into the following categories: 

■ trigonometric functions 

■ Bessel functions 

■ hyperbolic functions 

■ miscellaneous functions 

Trigonometric Functions 

These functions are used to compute angles (in radian measure), sines, 
cosines, and tangents. All of these values are expressed in double-precision. 



Function 


Reference 


Brief Description 


acos 


trig(3M) 


Return arc cosine. 


asin 


trig(3M) 


Return arc sine. 


atan 


trig(3M) 


Return arc tangent. 


atan2 


trigOM) 


Return arc tangent of a ratio. 


cos 


trig(3M) 


Return cosine. 


sin 


trig(3M) 


Return sine. 


tan 


trig(3M) 


Return tangent. 



Bessel Functions 

These functions calculate Bessel functions of the first and second kinds 
of several orders for real values. The Bessel functions are jO, jl, jn, yO, yl, 
and yn. The functions are located in section bessel(3M). 
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Hyperbolic Functions 

These functions are used to compute the hyperboUc sine, cosine, and 
tangent for real values. 



Function 


Reference 


Brief Description 


cosh 


sinh(3M) 


Return hyperbolic cosine. 


sinh 


sinh(3M) 


Return hyperbolic sine. 


tanh 


sinh(3M) 


Return hyperbolic tangent. 



Miscellaneous Functions 

These functions cover a wide variety of operations, such as natural loga- 
rithm, exponential, and absolute value. In addition, several are provided to 
truncate the integer portion of double-precision numbers. 



Function 


Reference 


Brief Description 


ceil 


floor(3M) 


Returns the smallest integer not less 
than a given value. 


exp 


exp(3M) 


Returns the exponential function of a 
given value. 


fabs 


floor(3M) 


Returns the absolute value of a given 
value. 


floor 


floor(3M) 


Returns the largest integer not greater 
than a given value. 


fmod 


£loor(3M) 


Returns the 
remainder 

produced by the division of two given 
values. 


gamma 


gamma(3M) 


Returns the natural log of the absolute 
value of the result of applying the 
gamma function to a given value. 
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Function 


Reference 


Brief Description 


hypot 


hypot(3M) 


Return the square root of the sum of 
the squares of two numbers. 


log 


exp(3M) 


Returns the natural logarithm of a 

glVeri VaiUcr. 


loglO 


exp(3M) 


Returns the logarithm base ten of a 
given Value. 


matherr 


inatherr(3M) 


Error-handling function. 


pow 


exp(3M) 


Returns the result of a given value 
raised to another given value. 


sqrt 


exp(3M) 


Returns the square root of a given 
value. 



Shared Libraries 

As noted above, shared libraries are also available on the UNIX system. 
Not only are some system libraries (libc and the networking library) avail- 
able in both archive and shared library form, but also applications have the 
option of creating private application shared libraries. 

The reason why shared libraries are desirable is that they save space, 
both on disk and in memory. With an archive library, when the link editor 
goes to the archive to resolve a reference it takes a copy of the object file 
that it needs for the resolution and binds it into the a.out file. From that 
point on the copied file is a part of the executable, whether it is in memory 
to be run or sitting in secondary storage. If you have a lot of executables 
that use, say, printf (which just happens to require much of the standard 
I/O library) you can be talking about a sizeable amount of space. 

With a shared library, the link editor does not copy code into the exe- 
cutable files. When the operating system starts a process that uses a shared 
library it maps the shared library contents into the address space of the pro- 
cess. Only one copy of the shared code exists, and many processes can use 
it at the same time. 
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This fundamental difference between archives and shared libraries has 
another significant aspect. When code in an archive library is modified, all 
existing executables are uneffected. They continue using the older version 
until they are re-link edited. When code in a shared library is modified, all 
programs that share that code use the new version the next time they are 
executed. 

All this may sound like a really terrific deal, but as with most things in 
life there are complications. To begin with, in the paragraphs above we 
didn't give you quite all the facts. For example, each process that uses 
shared library code gets its own copy of the entire data region of the 
library. It is actually only the text region that is really shared. So the truth 
is that shared libraries can add space to executing a.out's even though the 
chances are good that they will cause more shrinkage than expansion. 
What this means is that when there is a choice between using a shared 
library and an archive, you shouldn't use the shared library unless it saves 
space. If you were using a shared libc to access only strcmp, for example, 
you would pick up more in shared library data than you would save by 
sharing the text. 

The answer to this problem, and to others that are somewhat more com- 
plex, is to assign the responsibility for shared libraries to a central person or 
group within the application. The shared library developer should be the 
one to resolve questions of when to use shared and when to use archive 
system libraries. If a private library is to be built for your application, one 
person or organization should be responsible for its development and 
maintenance. 

Where to Find More Information 

The sample application at the end of this chapter includes an example 
of the use of a shared library. Chapter 8 in Part 2 of this guide describes 
how shared libraries are built and maintained. 



Symbolic Debugger 

The use of sdb was mentioned briefly in Chapter 2. In this section we 
want to say a few words about sdb within the context of an application 
development project. 
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sdb works on a process, and enables a programmer to find errors in the 
code. It is a tool a programmer might use while coding and unit testing a 
program, to make sure it runs according to its design, sdb would normally 
be used prior to the time the program is turned over, along with the rest of 
the application, to testers. During this phase of the application develop- 
ment cycle programs are compiled with the — g option of cc to facilitate the 
use of the debugger. The symbol table should not be stripped from the 
object file. Once the programmer is satisfied that the program is error-free, 
strip(l) can be used to reduce the file storage overhead taken by the file. 

If the application uses a private shared library, the possibility arises that 
a program bug may be located in a file that resides in the shared library. 
Dealing with a problem of this sort calls for coordination by the administra- 
tor of the shared library. Any change to an object file that is part of a 
shared library means the change effects all processes that use that file. One 
program's bug may be another program's feature. 

Where to Find More Information 

Chapter 15 in Part 2 of this guide contains information on how to use 
sdb. The manual page is in Section (1) of the Programmer s Reference Manual. 



lint as a Portability Tool 

It is a characteristic of the UNIX system that language compilation sys- 
tems are somewhat permissive. Generally speaking it is a design objective 
that a compiler should run fast. Most C compilers, therefore, let some 
things go unflagged as long as the language syntax is observed statement by 
statement. This sometimes means that while your program may run, the 
output will have some surprises. It also sometimes means that while the 
program may run on the machine on which the compilation system runs, 
there may be real difficulties in running it on some other machine. 

That's where lint comes in. lint produces comments about inconsisten- 
cies in the code. The ty pes of anomalies flagged by lint are: 

■ cases of disagreement between the type of value expected from a 
called function and what the function actually returns 

■ disagreement between the types and number of arguments expected 
by functions and what the function receives 



APPLICATION PROGRAMMING 3-31 



Programming Support Tools 



■ inconsistencies that might prove to be bugs 

■ things that might cause portabihty problems 

Here is an example of a portability problem that would be caught by 
lint. 

Code such as this: 

int i = lseek(fdes, offset, vrtience) 

would get by most compilers. However, Iseek returns a long integer 
representing the address of a location in the file. On a machine with a 16- 
bit integer and a bigger long int, it would produce incorrect results, because 
i would contain only the last 16 bits of the value returned. 

Since it is reasonable to expect that an application written for a UNIX 
system machine will be able to run on a variety of computers, it is impor- 
tant that the use of lint be a regular part of the application development. 

Where to Find More Information 

Chapter 16 in Part 2 of this guide contains a description of lint with 
examples of the kinds of conditions it uncovers. The manual page is in Sec- 
tion (1) of the Programmer s Reference Manual. 
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Volumes have been written on the subject of project control. It is an 
item of top priority for the managers of any application development team. 
Two UNIX system tools that can play a role in this area are described in this 
section. 



make 

make is extremely useful in an application development project for 
keeping track of what object files need to be recompiled as changes are 
made to source code files. One of the characteristics of programs in a UNIX 
system environment is that they are made up of many small pieces, each in 
its own object file, that are link edited together to form the executable file. 
Quite a few of the UNIX system tools are devoted to supporting that style 
of program architecture. For example, archive libraries, shared libraries and 
even the fact that the cc command accepts .o files as well as .c files, and that 
it can stop short of the Id step and produce .o files instead of an a.out, are 
all important elements of modular architecture. The two main advantages 
of this type of programming are that 

■ A file that performs one function can be re-used in any program that 
needs it. 

■ When one function is changed, the whole program does not have to 
be recompiled. 

On the flip side, however, a consequence of the proliferation of object 
files is an increased difficulty in keeping track of what does need to be 
recompiled, and what doesn't, make is designed to help deal with this 
problem. You use make by describing in a specification file, called 
makefile, the relationship (that is, the dependencies) between the different 
files of your program. Once having done that, you conclude a session in 
which possibly a number of your source code files have been changed by 
running the make command, make takes care of generating a new a.out by 
comparing the time-last-changed of your source code files with the depen- 
dency rules you have given it. 
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make has the ability to work with files in archive libraries or under 
control of the Source Code Control System (SCCS). 

Where to Find More Information 

The make(l) manual page is contained in the Programmer s Reference 
Manual. Chapter 13 in Part 2 of this guide gives a complete description of 
how to use make. 



SCCS 

SCCS is an acronym for Source Code Control System. It consists of a set 
of 14 commands used to track evolving versions of files. Its use is not lim- 
ited to source code; any text files can be handled, so an application's docu- 
mentation can also be put under control of SCCS. SCCS can: 

■ store and retrieve files under its control 

■ allow no more than a single copy of a file to be edited at one time 

■ provide an audit trail of changes to files 

■ reconstruct any earlier version of a file that may be wanted 

SCCS files are stored in a special coded format. Only through com- 
mands that are part of the SCCS package can files be made available in a 
user's directory for editing, compiling, etc. From the point at which a file is 
first placed under SCCS control, only changes to the original version are 
stored. For example, let's say that the program, restate, that was used in 
several examples in Chapter 2, was controlled by SCCS. One of the original 
pieces of that program is a file called oppty.c that looks like this: 
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/* (Opportunity Cost — pppty.c ♦/ 

#iiiclude "recdef .h" 

flcjat 
pppty(ps) 
struct rec -t^ps; 
{ 

retum(ps->i/12 * ps->t * ps->dp) ; 

} 



If you decide to add a message to this function, you might change the 
file like this: 




/* Opp or tunity Cost ~ opp t y .c »/ 

#lncludfi "recJdef .h" 
#iiiclude <stdio.h> 



float 
Oippty(ps) 
struct rec ^ps; 
{ 



(void) fprintf(stderr, "Opportunity callir^Nn" ) ; 
retum(ps->i/12 » ps->t * ps-xJp) ; 

} 




sees saves only the two new lines from the second version, with a 
coded notation that shows where in the text the two lines belong. It also 
includes a note of the version number, lines deleted, lines inserted, total 
lines in the file, the date and time of the change and the login id of the per- 
son making the change. 
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Where to Find More Information 

Chapter 14 in Part 2 of this guide is an SCCS user's guide. SCCS com- 
mands are in Section (1) of the Programmer s Reference Manual 
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To illustrate the use of UNIX system programming tools in the develop- 
ment of an application, we are going to pretend we are engaged in the 
development of a computer system for a library. The system is known as 
liber. The early stages of system development, we assume, have already 
been completed; feasibility studies have been done, the preliminary design 
is described in the coming paragraphs. We are going to stop short of pro- 
ducing a complete detailed design and module specifications for our system. 
You will have to accept that these exist. In using portions of the system for 
examples of the topics covered in this chapter, we will work from these vir- 
tual specifications. 

We make no claim as to the efficacy of this design. It is the way it is 
only in order to provide some passably realistic examples of UNIX system 
programming tools in use- 
liber is a system for keeping track of the books in a library. The 
hardware consists of a single computer with terminals throughout the 
library. One terminal is used for adding new books to the data base. Oth- 
ers are used for checking out books and as electronic card catalogs. 

The design of the system calls for it to be brought up at the beginning 
of the day and remain running while the library is in operation. The sys- 
tem has one master index that contains the unique identifier of each title in 
the library. When the system is running the index resides in memory. 
Semaphores are used to control access to the index. In the pages that follow 
fragments of some of the system's programs are shown to illustrate the way 
they work together. The startup program performs the system initialization; 
opening the semaphores and shared memory; reading the index into the 
shared memory; and kicking off the other programs. The id numbers for 
the shared memory and semaphores (shmid, wrtsem, and rdsem) are read 
from a file during initialization. The programs all share the in-memory 
index. They attach it with the following code: 
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/♦ attach shared raencocy for index */ 

if ((int) (index = (INDEX *) shnoatC shmid, NULL, 0)) == -1) 
{ 

(void) fprintf (stderr, "shnat failed: %i\n", ermo); 
exit(l); 

} 



Of the programs shown, add-books is the only one that alters the index. 
The semaphores are used to ensure that no other programs will try to read 
the index while add-books is altering it. The checkout program locks the 
file record for the book, so that each copy being checked out is recorded 
separately and the book cannot be checked out at two different checkout sta- 
tions at the same time. 

The program fragments do not provide any details on the structure of 
the index or the book records in the data base. 



/* liJDer.h - header file for the 
« lihrary system. 

*/ 

typedef . . . INDEX; /* data structure for book file iiidex */ 
typedef struct { /» type of records in book file ♦/ 

char title[303; 

char author [30]; 



} BCXK; 
int shndd; 
int wrtsem; 
int rdsem; 
INDEX «index; 



int book_file; 
BOGK book buf ; 
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/* startu?) p rogr a m ♦/ 



/♦ 

* 1. Open shared roencxry for file index and read it in. 

* 2. Open two semaphores for providing exclusive write access to index. 
» 3. Stash id's for shared roenory segment and seraaphores in a file 

* v*iere they can be accessed by the programs. 

* 4. Start programs: add-bodks, card-catalog, and checkout running 

* on the various terminals throui^MUt the library. 
*/ 



#include 
#incl\3de 
#include 
#include 
#include 
#incl\2de 



<stdio.h> 

<sys/types.h> 

<sys/ipc.h> 

<sys/shni.h> 

<sys/sem.h> 

"liber.h" 



void exit( ) ; 
extern int ermo; 

key t key; 
int shmid; 
int vrrtsem; 
int rdsem; 
FILE *ipc_file; 

min( ) 
{ 



if ((shmid = shrogetCtoey, sizeof (INDEX) , IPC C3^EAT I 0666)) == -1) 
{ 

(void) fprintf{stderr, "startup: shmget failed: erxno=%d\n", ermo); 
exitd); 

} 

if ((wrtsem = s€raget(key, 1, IPC GREAT 1 0666)) == -1) 
{ 



(void) fprintf (stderr, "startup: seroget failed: ermo=%d\n", ermo); 
exit( 1 ) ; 
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continued 




if ((rdsem = semgetCkey, 1, IPC CREAT I 0666)) == -1) 
{ 

(void) fprintf (stderr, "startugp: seinget failed: ermo=%d\n", ermo); 
exitd); 

} 

(void) fprintf (ipc_file, "56d\n%d\n%cl\n" , shmid, wrtsem, rdsem); 
A 

* Start the add-bodks program running on the terminal in the 

* basement. Start the checkout and card-catalog programs 

* running on the various other terminals throuc^icut the library. 
*/ 



/♦ card-catalog program*/ 
/* 

* 1. Read screen fco: authoEr and title. 

* 2. Use senaphores to prevent reading index \Aa.le it is being written. 

* 3. Use index to get position of book record in book file. 

* 4. Print book record on screen or indicate book was not found. 

* 5, Go to 1, 
*/ 

#include <stdio.h> 
#include <sys/types.h> 
#include <sys/ipc . h> 

#include <sys/sem.h> 
#incl\jde <f cntl . h> 
#incl\ide "liber.h" 

void exit( ) ; 
extern int ermo; 
struct sembuf sop[1]; 



iiain( ) { 
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continued 




vdiile (1) 
{ 

/* 

« Read authar/title/subject infortiation from screen. 
»/ 

/♦ 

♦ Wait far write senaphore to reach (index not being written) . 
♦/ 

sqp[0] .sem op = 1; 

if (sesaop(wrtsem, scp, 1) «- -1) 

{ 



(void) fprintf (stderr» "senop failed: %d\n", ermo) ; 
exitd); 



} 

A 

# Increment read semphore so potential writer will wait 

* for us to finish reading the Inciex. 
#/ 

sopCO] .sein_op = 0; 

if (s€npp(rdsem, sop, 1) == -1) 

{ 



(void) fprintf (stderr, "senop failed: %i\n'*, ermo); 
exitd); 



/* Use index to find file pointer (s) for bock{s) •/ 

/# Decrement read semaphore ♦/ 

8op[0] .sem_op = -1; 

if (semc^Crdsem, sop, 1) « -1) 

{ 



(void) fprintf (stderr, "senop failed: ?&i\n", ermo); 
exitd); 



/# 

* New we use the file pointers found in the index to 

* read the book file. Uien we print the informtian 

* on the book(s) to the screen. 
♦/ 

} /* vAiile */ 



} 



} 
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continued 



checkcut program*/ 



/♦ 






1. 


* 


2. 




3. 


* 


4. 


* 




♦ 


5. 


* 






6. 


* 


7. 


*/ 



Read screen for Dewey Decimal nmriber of book to be checked oat. 
Use semaphores to prevent reading index v^le it is being written. 
Use index to get position of book reoard in book file. 
If book not found print message on screen, otherwise lock 
book reoard and read. 

If bock already checked out print message on screen, othervdse 
mark record "checked out" and write back to book file. 
Unlock book reoard. 
Go to 1. 



#include 
#ijiclTade 
#inclT2de 
#iiiclude 
#incl\2de 
#incl\3de 



<stdio.h> 
<sys/types.h> 
<S3^/ilX3.h>- 
<sys/sem.h> 

<fcntl.h> 

"liber.h" 



void exit( ) ; 
long IseekO; 
extern int ermo; 
Struct flock flk; 
struct sembuf sop[1]; 
long bookpos; 

main( ) 
{ 



vMle (1) 
{ 

/* 

* Read Dewey Decinal nucidser from screen. 
*/ 



3-42 PROGRAMMER'S GUIDE 



liber, A Library System 




/* 

♦ Wait far write senaphore to reach (index not being written) . 
»/ 

soptO] .sem_flg = 0; 

sc3p[0] ,sem pp = 0; 

if (seniop(wrtsem, sop, 1) == -1) 

{ 

(void) fprintf(stderr, "senop failed: %d\n", ermo); 
exitd); 

} 

/» 

♦ Increnient read semaphore so potential writer will wait 

♦ for us to finish reading the index. 
♦/ 

scp[0] .sent op = 1; 

if (senop(rdsem, sqp, 1) == -1) 

{ 

(void) fprintfCstderr, "sempp failed: %d\n", ermo); 
exitd); 

} 

/* 

♦ New we can \jse the index to find the bock's record position. 

♦ Assign this value to "bodkpos". 
♦/ 

/« Decrement read saiaphcDre «/ 

sop[0] .sem_op = -1; 

if (s€niop(rds€ni, sop, 1) == -1) 

{ 

(void) fprintf (stderr, "semop failed: 96dNji", ermo); 
exit(l); 

} 

/♦ Lock the book's record in book file, read the record. «/ 

flk.ltype = FWRLCK; 

flk.lvjhence = 0; 

flk.lstart = bodkpos; 

flk.l^len = sizeof(BOOK); 

if (fcnU(bock_file, F_SErucw, S.flk) == -1) 
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continued 




(void) fprintf (stderr, "trouble lodking: %d\n", ermo) ; 
exit(l); 



if (read(book_file, Sixxxic^baf, sizeof (BOOK) ) == -1) 



/* 

♦ If the bcxak is dhecked out inform the client, otherwise 

♦ nark the book's record as checked out and write it 

♦ back into the book file, 
*/ 

/* Unlock the book's record in book file. »/ 
flk.l type = F^UNLCK; 

if (fcntKbook'file, F SETLK, &flk} == -1) 
{ 



(void) fprintf (stderr, "trouble unlocking: 9fid\n", ermo); 
exit(l); 



/♦ add-booiks program*/ 
/# 

* 1 . Read a new book entry from screen. 

* 2, Insert book in book file. 

» 3, Use semaFhore "wrtsem" to block new readers. 

* 4. wadt for semaphDre "rdsem" to reach 0. 

* 5. Insert book into index. 

* 6. Decrement wctsem- 

* 7. Go to 1. 
*/ 



if 



(lseek(book_file, booikpos, 0) == -1) 



Error processing for Iseek; 



Error processing for read ; 



} 

} /* vdiile */ 
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continued 




^include <stdio.h> 
#include <sys/types.h> 
#include <sys/ipc , h> 
#iiiclude <ays/sem.h> 
#ijiclude " liber, h" 

void ex±t{ ) ; 
extern int ermo; 
struct sembuf sop[1]; 
BOOK Ixxikbuf ; 

nain( ) 
{ 



/* 

* Read infornatian an new book from screen. 
*/ 

addscr(SJ30Qkbu£) ; 

/♦ write new record at the end of the bookfile. 

* Code not shown, but 

* addscr{ ) returns a 1 if title information has 

* been entered, if not. 
♦/ 

A 

* Increment write semaphore, blocking new readers from 

* accessing the index. 
*/ 

sop[0] .sem_flg = 0; 

sop[0] .sem_op = 1; 

if (semop(wrtsem, sop, 1) == -1) 

{ 



(void) fprintf (stderr, "semDp failed: %d\n", ermo); 
exit(l); 



for (;;) 
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/* 

* Vd^t for read ses^phocre to reach (all readers to finish 
« using the index). 

«/ 

sop[0].sem_op = 0; 

if (seoQop(rdsem, sop, 1) == -1) 

{ 

(void) fprintfCstderr, "sencp failed: JfidNn", ermo); 
exit(l); 

} 

/» 

* Now that we have exclusive access to the index we 

* insert our new book with its file pointer. 
»/ 

/♦ Decrement write senaphore, permitting readers to read index. */ 

scip[0].sem_op = -1; 

if ( senopCwrtsem, sop, 1) == -1) 

{ 

(void) fprintf(stderr, "semop failed: 9fid\n", ermo); 
exit(l); 

) 

} /♦ for */ 



The example following, addscrO, illustrates two significant points about 
curses screens: 

1 . Information read in from a curses window can be stored in fields 
that are part of a structure defined in the header file for the applica- 
tion. 

2. The address of the structure can be passed from another function 
where the record is processed. 
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addscr is called frora add-bodks. 
The user is prompt ed for title 
infamation. 



#iiiclude <curses.h> 



WXMDCW «aDadMin; 



addscr(UD) 
struct BOOK »bb; 
{ 

int c; 



ijiitscr( ) ; 
nonlO; 
noechoO ; 
cbreak( ) ; 



cndwiii = newwin{6 , 40 , 3 , 20); 

invprintw(0, 0, "TMs screen is for adding titles to the data base"); 

nwprintwd, 0, "Enter a to add; q to quit: "); 

ref resh( ) ; 

far (;;) 

{ 

ref resh( ) ; 
c = getch{ ) ; 
switch (c) { 
case 'a': 

werase(caidwm); 
boK(cndwin, '!% '-'); 
mvwprintw(cnidwin, 1, 1, "Enter title: 
v4inove(anchidji, 2, 1); 
echoO; 

va:efresh(cni3wiii) ; 
Mgetstr(Gmdwin, hb->title); 
noechoO; 
werase(aidwin); 
bcx(aidwin, '!', '-'); 

invMprintw(andwin, 1, 1, "Enter author: "); 

wiDve(cndwiii, 2, 1); 

eciioO; 

vnrefreshCatdwin) ; 
wgetstrCcndwin, hb->author); 
noecho( ) ; 
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werase(andwin) ; 
wrefresh(aidwin) ; 
endwin( ); 
retumd); 



continued 




case 'q': 



erase( ) ; 
endwinO; 
retum(O) ; 



# 

# Makefile for liber library system 
# 

CC = cc 
CFLAGS = -O 

all: startup add-boc3ks chectoit card-catalog 

startup: liber. h starti:5).c 

$[CC) $(CFLAGS) -o startup startiflp.c 

add-bobks: add-books.o addscr.o 

$(0C) $(CFLAGS) -o add-bcdks add-bocks.o addscr.o 

add-books.o: liber. h 

checkout: liber. h checkout, c 

${0C) ${CFLAGS) -o checkout checkout. c 

card-catalog: liber .h card-catcilog.c 

$(0C) $(CFLA3S) -o card-catalog card-catalog. c 
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Introduction 



This chapter describes the new version of awk released in UNIX Sys- 
NOTE tern V Release 3.1 and described in nawk(l). An earlier version is 

I I described in awk(l). The new version will become the default in the 

I next major UNIX system release. Until then, you should read nawk 

for awk in this chapter. 



Suppose you want to tabulate some survey results stored in a file, print 
various reports summarizing these results, generate form letters, reformat a 
data file for one application package to use with another package, or count 
the occurrences of a string in a file, awk is a programming language that 
makes it easy to handle these and many other tasks of information retrieval 
and data processing. The name awk is an acronym constructed from the 
initials of its developers; it denotes the language and also the UNIX system 
command you use to run an awk program. 

awk is an easy language to learn. It automatically does quite a few 
things that you have to program for yourself in other languages. As a 
result, many useful awk programs are only one or two lines long. Because 
awk programs are usually smaller than equivalent programs in other 
languages, and because they are interpreted, not compiled, awk is also a 
good language for prototyping. 

The first part of this chapter introduces you to the basics of awk and is 
intended to make it easy for you to start writing and running your own 
awk programs. The rest of the chapter describes the complete language and 
is somewhat less tutorial. For the experienced awk user, there's a summary 
of the language at the end of the chapter. 

You should be familiar with the UNIX system and shell programming 
to use this chapter. Although you don't need other programming experi- 
ence, some knowledge of the C programming language is beneficial, 
because many constructs found in awk are also found in C. 
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This section provides enough information for you to write and run 
some of your own programs. Each topic presented is discussed in more 
detail in later sections. 



Program Structure 

The basic operation of awk(l) is to scan a set of input lines one after 
another, searching for lines that match any of a set of patterns or conditions 
you specify. For each pattern, you can specify an action; this action is per- 
formed on each line that matches the pattern. Accordingly, an awk pro- 
gram is a sequence of pattern-action statements, as Figure 4-1 shows. 




pattern { action ] 
pattern [ action ] 



Example: 



$1 == "address" { print $2, $3 } 




Figure 4-1: awk Program Structure and Example 



The example in the figure is a typical awk program, consisting of one 
pattern-action statement. The program prints the second and third fields of 
each input line whose first field is address. In general, awk programs work 
by matching each line of input against each of the patterns in turn. For 
each pattern that matches, the associated action (which may involve multi- 
ple steps) is executed. Then the next line is read and the matching starts 
over. This process typically continues until all the input has been read. 
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Either the pattern or the action in a pattern-action statement may be 
omitted. If there is no action with a pattern, as in 

$1 == "name" 

the matching line is printed. If there is no pattern with an action, as in 
{ print $1, $2 } 

the action is performed for every input line. Since patterns and actions are 
both optional, actions are enclosed in braces to distinguish them from pat- 
terns. 



Usage 

There are two ways to run an awk program. First, you can type the 
command line 

awk 'pattern-action statements' optional list of input files 

to execute the pattern-action statements on the set of named input files. For 
example, you could say 

awk '{ print $1, $2 }' filel file2 

Notice that the pattern-action statements are enclosed in single quotes. This 
protects characters like $ from being interpreted by the shell and also 
allows the program to be longer than one line. 

If no files are mentioned on the command line, awk(l) reads from the 
standard input. You can also specify that input comes from the standard 
input by using the hyphen ( - ) as one of the input files. For example, 

awk '{ print $3, $4 }' filel - 

says to read input first from filel and then from the standard input. 

The arrangement above is convenient when the awk program is short (a 
few lines). If the program is long, it is often more convenient to put it into 
a separate file and use the -f option to fetch it: 

awk -f program file optional list of input files 

For example, the following command line says to fetch and execute mypro- 
gram on input from the file filel: 

awk — f myprogram filel 
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Fields 

awk normally reads its input one line, or record, at a time; a record is, 
by default, a sequence of characters ending with a newline. awk then splits 
each record into fields, where, by default, a field is a string of non-blank, 
non-tab characters. 

As input for many of the awk programs in this chapter, we use the file 
countries, which contains information about the ten largest countries in the 
world. Each record contains the name of a country, its area in thousands of 
square miles, its population in millions, and the continent on which it is 
found. (Data are from 1978; the U.S.S.R. has been arbitrarily placed in 
Asia.) The white space between fields is a tab in the original input; a single 
blank separates North and South from America . 



USSR 


8650 


262 


Asia 


Canada 


3852 


24 


North America 


China 


3692 


866 


Asia 


USA 


3615 


219 


North America 


Brazil 


3286 


116 


ScfUth America 


Australia 


2968 


14 


Australia 


India 


1269 


637 


Asia 


Argentina 


1072 


26 


South America 


Sudan 


968 


19 


Africa 


Algeria 


920 


18 


Africa 



Figure 4-2: The Sample Input File countries 



This file is typical of the kind of data awk is good at processing — a mix- 
ture of words and numbers separated into fields by blanks and tabs. 

The number of fields in a record is determined by the field separator. 
Fields are normally separated by sequences of blanks and /or tabs, so that 
the first record of countries would have four fields, the second five, and so 
on. It's possible to set the field separator to just tab, so each line would 
have four fields, matching the meaning of the data; we'll show how to do 
this shortly. For the time being, we'll use the default: fields separated by 
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blanks and/or tabs. The first field within a line is called $1, the second $2, 
and so forth. The entire record is called $0. 

Printing 

If the pattern in a pattern-action statement is omitted, the action is exe- 
cuted for all input lines. The simplest action is to print each line; you can 
accomplish this with an awk program consisting of a single print statement 

{ print } 

so the command line 

awk '{ print }' countries 

prints each line of countries, copying the file to the standard output. The 
print statement can also be used to print parts of a record; for instance, the 
program 

{ print $1, $3 } 

prints the first and third fields of each record. Thus 

awk '{ print $1, $3 }' countries 

produces as output the sequence of lines: 




Canada 24 
China 866 
USA 219 
Brazil 116 
Australia 14 
India 637 
Argentina 26 



Sudan 19 
Algeria 18 
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When printed, items separated by a comma in the print statement are 
separated by the output field separator, which by default is a single blank. 
Each line printed is terminated by the output record separator, which by 
default is a newline. 



NOTE 



In the remainder of this chapter, we only show awk programs, without 
the command line that invokes them. Each complete program can be run 
either by enclosing it in quotes as the first argument of the awk com- 
mand, or by putting it in a file and invoking awk with the — £ flag, as dis- 
cussed in "awk Command Usage," In an example, if no input is men- 
tioned, the input is assumed to be the file countries. 



Formatted Printing 

For more carefully formatted output, awk provides a C-like printf state- 
ment 

printf format, expri, expr2, . . . / ^xpr„ 

which prints the expr{s according to the specification in the string format. 
For example, the awk program 

{ printf "%10s %6d\n", $1, $3 } 

prints the first field ($1) as a string of 10 characters (right justified), then a 
space, then the third field ($3) as a decimal number in a six-character field, 
then a newline (\n). With input from the file countries^ this program prints 
an aligned table: 
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USSR 


262 




24 


China 


866 


USA 


219 


Brazil 


116 


Australia 


14 


India 


637 


Argentina 


26 


Sudan 


19 


Algeria 


18 



With printf, no output separators or newlines are produced automati- 
cally; you must create them yourself by using \n in the format specification. 
"The printf Statement" in this chapter contains a full description of printf . 



You can select specific records for printing or other processing by using 
simple patterns, awk has three kinds of patterns. First, you can use pat- 
terns called relational expressions that make comparisons. For example, the 
operator tests for equality. To print the lines for which the fourth field 
equals the string Asia, we can use the program consisting of the single pat- 
tern 

$4 == "Asia" 

With the file countries as input, this program yields 

USSR 8650 262 Asia 
China 3692 866 Asia 
India 1269 637 Asia 

The complete set of comparisons is <=, == (equal to) and != 

(not equal to). These comparisons can be used to test both numbers and 
strings. For example, suppose we want to print only countries with a popu- 
lation greater than 100 million. The program 
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$3 > 100 

is all that is needed. (Remember that the third field in the file countries is 
the population in millions.) It prints all lines in which the third field 
exceeds 100. 

Second, you can use patterns called regular expressions that search for 
specified characters to select records. The simplest form of a regular expres- 
sion is a string of characters enclosed in slashes: 

/US/ 

This program prints each line that contains the (adjacent) letters US any- 
where; with the file countries as input, it prints 

USSR 8650 262 Asia 

USA 3615 219 North America 

We will have a lot more to say about regular expressions later in this 
chapter. 

Third, you can use two special patterns, BEGIN and END, that match 
before the first record has been read and after the last record has been pro- 
cessed. This program uses BEGIN to print a title: 

BEGIN { print "Countries of Asia:" } 
/Asia/ { print " $1 } 

The output is 

Countries of Asia: 
USSR 
China 
India 



Simple Actions 

We have already seen the simplest action of an awk program: printing 
each input line. Now let's consider how you can use built-in and user- 
defined variables and functions for other simple actions in a program. 
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Built-in Variables 

Besides reading the input and splitting it into fields, awk(l) counts the 
number of records read and the number of fields within the current record; 
you can use these counts in your awk programs. The variable NR is the 
number of the current record, and NF is the number of fields in the record. 
So the program 

{ print NR, NF } 
prints the number of each line and how many fields it has, while 

{ print NR, $0 } 
prints each record preceded by its record number. 

User-defined Variables 

Besides providing built-in variables like NF and NR, awk lets you 
define your own variables, which you can use for storing data, doing arith- 
metic, and the like. To illustrate, consider computing the total population 
and the average population represented by the data in the file countries: 

{ sum = sum + $3 } 
END { print "Tot:al population is", sum, "million" 

print "Average population of", NR, "countries is", sum/NR } 



NOTE 



awk initializes sum to zero before it is used. 



The first action accumulates the population from the third field; the second 
action, which is executed after the last input, prints the sum and average: 

Total population is 2201 million 

Average population of 10 countries is 220.1 

Functions 

awk has built-in functions that handle common arithmetic and string 
operations for you. For example, there's an arithmetic function that com- 
putes square roots. There is also a string function that substitutes one string 
for another, awk also lets you define your own functions. Functions are 
described in detail in the section "Actions" in this chapter. 
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A Handful of Useful One-liners 

Although awk can be used to write large programs of some complexity, 
many programs are not much more complicated than what we've seen so 
far. Here is a collection of other short programs that you may find useful 
and instructive. They are not explained here, but any new constructs do 
appear later in this chapter. 

Print last field of each input line: 
{ print $NF } 

Print 10th input line: 
NR == 10 

Print last input line: 

{ line = $0} 
END { print line } 

Print input lines that don't have four fields: 

NF 1= 4 { print $0, "does not have 4 fields" } 

Print input lines with more than four fields: 
NF > 4 

Print input lines with last field more than 4: 
$NF > 4 

Print total number of input lines: 
END { print NR } 

Print total number of fields: 

{ nf = nf + NF } 
END { print nf } 

Print total number of input characters: 

{ nc = nc + length($0) } 
END { print nc + NR } 
(Adding NR includes in the total the number of newlines.) 

Print the total number of lines that contain the string Asia: 

/Asia/ { nlines++ } 

END { print nlines } 
(The statement nlines++ has the same effect as nlines = nlines 
+ 1.) 
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Error Messages 

If you make an error in your awk program, you generally get an error 
message. For example, trying to run the program 

$3 < 200 { print ( $1 } 

generates the error messages 



awk.: syntax error at source line 1 
context is 

$3 < 200 { print ( >» $1 } «< 
awk: illegal statement at source line 1 
1 extra ( 



Some errors may be detected while your program is running. For example, 
if you try to divide a number by zero, awk stops processing and reports the 
input record number (NR) and the line number in the program. 
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In a pattern-action statement, the pattern is an expression that selects 
the records for which the associated action is executed. This section 
describes the kinds of expressions that may be used as patterns. 



BEGIN and END 

BEGIN and END are two special patterns that give you a way to control 
initialization and wrap-up in an awk program. BEGIN matches before the 
first input record is read, so any statements in the action part of a BEGIN 
are done once, before the awk command starts to read its first input record. 
The pattern END matches the end of the input, after the last record has 
been processed. 

The following awk program uses BEGIN to set the field separator to tab 
(\t) and to put column headings on the output. The field separator is stored 
in a built-in variable called FS. Although FS can be reset at any time, usu- 
ally the only sensible place is in a BEGIN section, before any input has 
been read. The program's second printf statement, which is executed for 
each input line, formats the output into a table, neatly aligned under the 
column headings. The END action prints the totals, (Notice that a long 
line can be continued after a comma.) 



BEGIN { FS = "\t" 

printf "%10s 966s 5©s 9fe\n", 

"ocmrRy", "area", "pop", "ooNTiMEMr" } 

{ printf "%10s %6d %5d %s\n", $1, $2, $3, $4 
area = area + $2; pop = pop + $3 } 
END { printf "\ii3610s 966d %5d>sii", "TOOIAL", area, pop } 



With the file countries as input, this program produces 
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CXXJNTRy 




POP 




USSR 


8650 


262 


Asia 


Canada 


3852 


24 


North America 


Qiina 


3692 


866 




USA 


3615 


219 


North America 


Brazil 


3286 


116 


South America 


Australia 


2968 


14 


Australia 


India 


1269 


637 


Asia 


Argentina 


1072 


26 


Scuth America 


Sudan 


968 


19 


Africa 


Algeria 


920 


18 


Africa 


TOTAL 


30292 


2201 





Relational Expressions 

An awk pattern can be any expression involving comparisons between 
strings of characters or numbers, awk has six relational operators, and two 
regular expression matching operators, - (tilde) and I *, which are discussed 
in the next section, for making comparisons. Figure 4-3 shows these opera- 
tors and their meanings. 
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Operator 



Meaning 




< 



less than 

less than or equal to 

equal to 

not equal to 

greater than or equal to 

greater than 

matches 

does not match 



Figure 4-3: awk Comparison Operators 



In a comparison, if both operands are numeric, a numeric comparison is 
made; otherwise, the operands are compared as strings. (Every value might 
be either a number or a string; usually awk can tell what is intended. The 
section "Number or String?" contains more information about this.) Thus, 
the pattern $3>100 selects lines where the third field exceeds 100, and the 
program 

$1 >= "S" 

selects lines that begin with the letters S through Z, namely. 



USSR 


8650 


262 


Asia 


USA 


3615 


219 


NOETth America 


Sudan 


968 


19 


Africa 



In the absence of any other information, awk treats fields as strings, so 
the program 

$1 == $4 

compares the first and fourth fields as strings of characters, and with the file 
countries as input, prints the single line for which this test succeeds: 

Australia 2968 14 Australia 

If both fields appear to be numbers, the comparisons are done numerically. 
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Regular Expressions 

awk provides more powerful patterns for searching for strings of char- 
acters than the comparisons illustrated in the previous section. These pat- 
terns are called regular expressions, and are like those in egrep(l) and 
lex(l). The simplest regular expression is a string of characters enclosed in 
slashes, like 

/Asia/ 

This program prints all input records that contain the substring Asia. (If a 
record contains Asia as part of a larger string like Asian or Pan-Asiatic, it 
is also printed.) In general, if re is a regular expression, then the pattern 

Ire/ 

matches any line that contains a substring specified by the regular expres- 
sion re. 

To restrict a match to a specific field, you use the matching operators - 
(matches) and ! - (does not match). The program 

$4 - /Asia/ { print $1 } 

prints the first field of all lines in which the fourth field matches Asia, 
while the program 

$4 I- /Asia/ { print $1 } 

prints the first field of all lines in which the fourth field does not match 
Asia . 

In regular expressions, the symbols 
\^ $.[!* + ?() ! 

are metacharacters with special meanings like the metacharacters in the 
UNIX shell. For example, the metacharacters and $ match the beginning 
and end, respectively, of a string, and the metacharacter . ("dot") matches 
any single character. Thus, 

/".$/ 

matches all records that contain exactly one character. 
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A group of characters enclosed in brackets matches any one of the 
enclosed characters; for example, / [ABC] / matches records containing any 
one of A, or C anywhere. Ranges of letters or digits can be abbreviated 
within brackets: / [a-zA-Z] / matches any single letter. 

If the first character after the [ is a ^, this complements the class so it 
matches any character not in the set: / pa-zA-Z] / matches any non-letter. 
The program 

$2 I- /"[0-9] + $/ 

prints all records in which the second field is not a string of one or more 
digits C for beginning of string, [0-9]+ for one or more digits, and $ for 
end of string). Programs of this nature are often used for data validation. 

Parentheses ( ) are used for grouping and the symbol ! is used for alter- 
natives. The program 

/(applelcherxy) (pieltart)/ 

matches lines containing any one of the four substrings apple pie, apple 
tart, cherry pie, or chen^r tart . 

To turn off the special meaning of a metacharacter, precede it by a \ 
(backslash). Thus, the program 

/b\$/ 

prints all lines containing b followed by a dollar sign. 

In addition to recognizing metacharacters, the awk command recognizes 
the following C programming language escape sequences within regular 
expressions and strings: 



\b 


backspace 


\f 


formfeed 


\n 


newline 


\r 


carriage return 


\t 


tab 


\ddd 


octal value ddd 


\" 


quotation mark 


\c 


any other character c literally 



For example, to print all lines containing a tab, use the program 
/\t/ 
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awk interprets any string or variable on the right side of a - or !- as a 
regular expression. For example, we could have written the program 

$2 I- /"[0-9] + $/ 

as 

BEGIN { digits = ""[0-9]+$" } 
$2 I- digits 



Suppose you wanted to search for a string of characters like ''[0-9]+$ 
When a literal quoted string like ""[0-9] + $" is used as a regular expression, 
one extra level of backslashes is needed to protect regular expression meta- 
characters. This is because one level of backslashes is removed when a 
string is originally parsed. If a backslash is needed in front of a character to 
turn off its special meaning in a regular expression, then that backslash 
needs a preceding backslash to protect it in a string. 

For example, suppose we want to match strings containing b followed 
by a dollar sign. The regular expression for this pattern is b\$. If we want 
to create a string to represent this regular expression, we must add one 
more backslash: "b\\$". The two regular expressions on each of the follow- 
ing lines are equivalent: 

X - "b\\$" X - /b\$/ 

X - "b\$" X - /b$/ 

X - /b$/ 

"\\t" X - /\t/ 



The precise form of regular expressions and the substrings they match is 
given in Figure 4-4. The unary operators +, and ? have the highest pre- 
cedence, then concatenation, and then alternation ! . All operators are left 
associative, r stands for any regular expression. 
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T^YnrPQQI ATI 


IVlatches 


c 


ciliy IlUll llldddictxctv-id c 


\r 








<K 
Ip 






any cnaracier duc newiine 


Pol 


any character in set s 


I sj 


any cnaracicr iiui iii oci & 




iCrCXVJ XXlk^XC I a 


r+ 


one or more /s 


r? 


zero or one r 


(/•) 


r 


r,r2 


Ti then r2 (concatenation) 


r,!r2 


fi or r2 (alternation) 



Figure 4-4: awk Regular Expressions 



Combinations of Patterns 

A compound pattern combines simpler patterns with parentheses and 
the logical operators 1 1 (or), && (and), and I (not). For example, suppose 
we want to print all countries in Asia with a population of more than 500 
million. The following program does this by selecting all lines in which 
the fourth field is Asia and the third field exceeds 500: 

$4 == "Asia" $3 > 500 

The program 

$4 == "Asia" ! ! $4 == "Africa" 

selects lines with Asia or Africa as the fourth field. Another way to write 
the latter query is to use a regular expression with the alternation operator 

! : 

$4 - /"(Asia!Africa)$/ 
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The negation operator ! has the highest precedence, then &&, and 
finally ! ! . The operators && and ! 1 evaluate their operands from left to 
right; evaluation stops as soon as truth or falsehood is determined. 

Pattern Ranges 

A pattern range consists of two patterns separated by a comma, as in 
pati,pat2 {...} 

In this case, the action is performed for each line between an occurrence of 
pat I and the next occurrence of pat 2 (inclusive). As an example, the pattern 

/Canada/, /Brazil/ 

matches lines starting with the first line that contains the string Canada up 
through the next occurrence of the string Brazil: 



Camda 


3852 


24 


North America 


China 


3692 


866 


Asia 


USA 


3615 


219 


Ncorth America 


Brazil 


3286 


116 


South America 



Similarly, since FNR is the number of the current record in the current 
input file (and FILENAME is the name of the current input file), the pro- 
gram 

FTJR == 1, == 5 { print FXLENAME, $0 } 

prints the first five records of each input file with the name of the current 
input file prepended. 
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In a pattern-action statement, the action determines what is to be done 
with the input records that the pattern selects. Actions frequently are sim- 
ple printing or assignment statements, but they may also be a combination 
of one or more statements. This section describes the statements that can 
make up actions. 



Built-in Variables 

Figure 4-5 lists the built-in variables that awk maintains. Some of these 
we have already met; others are used in this and later sections. 



Variable 


Meaning 


Default 


ARGC 


number of command-line arguments 




ARGV 


array of command-line arguments 




FILENAME 


name of current input file 




FNR 


record number in current file 




FS 


input field separator 


blank&tab 


NF 


number of fields in current record 




NR 


number of records read so far 




OFMT 


output format for numbers 


%-6g 


OFS 


output field separator 


blank 


ORS 


output record separator 


newline 


RS 


input record separator 


newline 


RSTART 


index of first character matched by matchO 




RLENGTH 


length of string matched by matchO 




SUBSEP 


subscript separator 


"\034" 



Figure 4-5: awk Built-in Variables 



Arithmetic 

Actions can use conventional arithmetic expressions to compute numeric 
values. As a simple example, suppose we want to print the population den- 
sity for each country in the file countries. Since the second field is the area 
in thousands of square miles and the third field is the population in 
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millions, the expression 1000 * $3 / $2 gives the population density in peo- 
ple per square mile. The program 

{ ptrintf "%10s %6.1f\n", $1, 1000 * $3 / $2 } 

applied to the file countries prints the name of each country and its popula- 
tion density: 



r 

USSR 30.3 

Canada 6.2 

China 234.6 

USA 60.6 

Brazil 35.3 

Australia 4.7 

India 502.0 

Argentina 24.3 

Sudan 19.6 

Algeria 19.6 




Arithmetic is done internally in floating point. The arithmetic operators 
are — , /, % (remainder) and (exponentiation; is a synonym). 
Arithmetic expressions can be created by applying these operators to con- 
stants, variables, field names, array elements, functions, and other expres- 
sions, all of which are discussed later. Note that awk recognizes and pro- 
duces scientific (exponential) notation: 1e6, 1E6, 10e5, and 1000000 are 
numerically equal. 

awk has assignment statements like those found in the C programming 
language. The simplest form is the assignment statement 

I? = e 

where i; is a variable or field name, and e is an expression. For example, to 
compute the number of Asian countries and their total population, we could 
write 

$4 == "Asia" { pcjp = pop + $3; n = n + 1 } 
END { print "popalation of" , n, 

"Asian ocuntries in millioEns is", pop } 



awk 4-21 



Actions 

Applied to countries, this program produces 

population of 3 Asian countries in millions is 1765 

The action associated with the pattern $4 == "Asia" contains two assign- 
ment statements, one to accumulate population and the other to count coun- 
tries. The variables are not explicitly initialized, yet everything works prop- 
erly because awk initializes each variable with the string value and the 
numeric value 0. 

The assignments in the previous program can be written more concisely 
using the operators += and ++: 

$4 == "Asia" { pop += $3; ++n } 

The operator += is borrowed from the C programming language: 

pop += $3 
has the same effect as 

pop = pop + $3 

but the += operator is shorter and runs faster. The same is true of the ++ 
operator, which adds one to a variable. 

The abbreviated assignment operators are +=/ — =, *=, /==, %=/ and '^=, 
Their meanings are similar: 

V op— e 

has the same effect as 

V = V op e. 

The increment operators are ++ and — . As in C they may be used as 
prefix (++x) or postfix (x++) operators. If x is 1, then i=++x increments x, 
then sets i to 2, while i=x++ sets i to 1, then increments x. An analogous 
interpretation applies to prefix and postfix — . 

Assignment and increment and decrement operators may all be used in 
arithmetic expressions. 

We use default initialization to advantage in the following program, 
which finds the country with the largest population: 

maacpop < $3 { naaqpop = $3; cx>untry = $1 } 
END { print country, iiEDcpop } 
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Note, however, that this program would not be correct if all values of $3 
were negative. 

awk provides the built-in arithmetic functions shown in Figure 4-6. 



Function 


Value Returned 


atan2(i/,x) 


arctangent of y /x in the range ~7r to tt 


cos(;ic) 


cosine of x, with x in radians 


exp(x) 


exponential function of x 


intix) 


integer part of x truncated towards 


log(x) 


natural logarithm of x 


randO 


random number between and 1 


sin(a^) 


sine of x, with x in radians 


sqrtix) 


square root of x 


srand(x) 


X is new seed for rand() 



Figure 4-6: awk Built-in Arithmetic Functions 



X and y are arbitrary expressions. The function randO returns a pseudo- 
random floating point number in the range (0,1), and srand(a:) can be used 
to set the seed of the generator. If srandO has no argument, the seed is 
derived from the time of day. 



Strings and String Functions 

A string constant is created by enclosing a sequence of characters inside 
quotation marks, as in "abc" or "hello, everyone". String constants may con- 
tain the C programming language escape sequences for special characters 
listed in "Regular Expressions" in this chapter. 

String expressions are created by concatenating constants, variables, 
field names, array elements, functions, and other expressions. The program 

{ print NR $0 } 

prints each record preceded by its record number and a colon, with no 
blanks. The three strings representing the record number, the colon, and 
the record are concatenated and the resulting string is printed. The con- 
catenation operator has no explicit representation other than juxtaposition. 



awk 4-23 



Actions 



awk provides the built-in string functions shown in Figure 4-7. In this 
table, r represents a regular expression (either as a string or as /rl), s and t 
string expressions, and n and p integers. 



Function 


Description 


gsub( r,s) 


substitute s for r globally in current record. 




return number of substitutions 


gsub(r, s, 


substitute s for r globally in string t. 




return number of substitutions 


indexCs, 


return position of string Mn s, if not present 


len£th(5) 


return length of s 


inatch(s, r) 


return the position in s where r occurs, if not present 


split(5, a) 


split s into array a on FS, return number of fields 


split(s, a, r) 


split s into array a on r, return number of fields 


sprintf (/m^ expr-list) 


return expr-list formatted according to format 




string fmt 


sub(r, s) 


substitute s for first r in current record, return 




number of substitutions 


sub(r, s, t) 


substitute s for first r in t, return number of 




substitutions 


substr(s, p) 


return suffix of s starting at position p 


substr(5, p, n) 


return substring of s of length n starting at 




position p 



Figure 4-7: awk Built-in String Functions 



The functions sub and gsub are patterned after the substitute command 
in the text editor ed(l). The function gsub(r,s,f ) replaces successive 
occurrences of substrings matched by the regular expression r with the 
replacement string s in the target string f. (As in ed, the leftmost match is 
used, and is made as long as possible.) It returns the number of substitu- 
tions made. The function gsub(r,s) is a synonym for gsub(r,s,$0). For 
example, the program 

{ gsub(/USA/, "United States"); print } 

transcribes its input, replacing occurrences of USA by United States. The 
sub functions are similar, except that they only replace the first matching 
substring in the target string. 
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The function index(5, returns the leftmost position where the string t 
begins in s, or zero if t does not occur in s. The first character in a string is 
at position 1, For example, 

iiidex( "banana", "an") 

returns 2. 

The length function returns the number of characters in its argument 
string; thus, 

{ print lengtii($0), $0 } 

prints each record, preceded by its length. ($0 does not include the input 
record separator.) The program 

length($1) > max { max = lengtli($1); name = $1 } 
END { print name } 

applied to the file countries prints the longest country name: Australia. 

The match(5, r) function returns the position in string s where regular 
expression r occurs, or if it does not occur. This function also sets two 
built-in variables RSTART and RLENGTH, RSTART is set to the starting 
position of the match in the string; this is the same value as the returned 
value. RLENGTH is set to the length of the matched string. (If a match 
does not occur, RSTART is 0, and RLENGTH is -1.) For example, the fol- 
lowing program finds the first occurrence of the letter i followed by at 
most one character followed by the letter a in a record: 

{ if (match($0, /i.?a/)) 

print RSTART, RLENGTH, $0 } 

It produces the following output on the file countries: 
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17 2 USSR 


8650 


262 


Asia 


26 3 Caiiada 


3852 


24 


North America 


3 3 China 


3692 


866 


Asia 


24 3 USA 


3615 


219 


North America 


27 3 Brazil 


3286 


116 


South Anierica 


8 2 Australia 


2968 


14 


Australia 


4 2 I^)dia 


1269 


637 


Asia 


7 3 Argentina 


1072 


26 


South America 


17 3 Sudan 


968 


19 


Africa 


6 2 Algeria 


920 


18 


Africa 



match( ) matches the left-most longest matching string. For example, with 
NOTE the record 

1 AsiaaaAsiaaaaan 

as input, the program 

{ if (inatch($0, /a+/)) print RSTART, RLEM3TH, $0 } 

matches the first string of a's and sets RSTftRT to 4 and RLENSOH to 3. 

The function sprintf (format, expvi, expr2, . . . / expr„) returns (without 
printing) a string containing expti, expri, • • expr„ formatted according to 
the printf specifications in the string format. "The printf Statement" in this 
chapter contains a complete specification of the format conventions. The 
statement 

X = sprintf ("%10s %6d", $1, $2) 

assigns to x the string produced by formatting the values of $1 and $2 as a 
ten-character string and a decimal number in a field of width at least six; x 
may be used in any subsequent computation. 
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The function substr(s,p,«) returns the substring of s that begins at posi- 
tion p and is at most n characters long. If substr(s, p) is used, the substring 
goes to the end of s; that is, it consists of the suffix of s beginning at posi- 
tion p. For example, we could abbreviate the country names in countries to 
their first three characters by invoking the program 

{ $1 = substr{$1, 1, 3); print } 

on this file to produce 




Can 3852 24 North America 

Chi 3692 866 Asia 

USA 3615 219 Ndrth America 

Bra 3286 116 South America 

Aus 2968 14 Australia 

Ind 1269 637 Asia 

Arg 1072 26 South America 



Sud 968 19 Africa 
Alg 920 18 Africa 




Note that setting $1 in the program forces awk to recompute $0 and, there- 
fore, the fields are separated by blanks (the default value of OFS), not by 
tabs. 

Strings are stuck together (concatenated) merely by writing them one 
after another in an expression. For example, when invoked on file coun- 
tries^ 

{ s = s substr($1, 1, 3) " " } 
END { print s } 

prints 

USS Can Qii USA Bra Aus Ind Arg Sud Alg 
by building s up a piece at a time from an initially empty string. 
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Field Variables 

The fields of the current record can be referred to by the field variables 
$1, $2, . . $NF. Field variables share all of the properties of other vari- 
ables — they may be used in arithmetic or string operations, and they may 
have values assigned to them. So, for example, you can divide the second 
field of the file countries by 1000 to convert the area from thousands to mil- 
lions of square miles: 

{ $2 /= 1000; print } 

or assign a new string to a field: 

BBGIN { FS = OFS = "\t" } 

$4 == "North America" { $4 = "NA" } 
$4 == "South America" { $4 = "SA" } 

{ print } 

The BEGIN action in this program resets the input field separator FS and the 
output field separator OFS to a tab. Notice that the print in the fourth line 
of the program prints the value of $0 after it has been modified by previous 
assignments. 

Fields can be accessed by expressions. For example, $(NF-1) is the 
second to last field of the current record. The parentheses are needed: the 
value of $NF— 1 is 1 less than the value in the last field. 

A field variable referring to a nonexistent field, for example, $(NF+1), 
has as its initial value the empty string. A new field can be created, how- 
ever, by assigning a value to it. For example, the following program 
invoked on the file countries creates a fifth field giving the population den- 
sity: 

BE3GIN { FS = OFS = "\t" } 

{ $5 = 1000 * $3 / $2; print } 

The number of fields can vary from record to record, but there is usu- 
ally an implementation limit of 100 fields per record. 
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Number or String? 

Variables, fields and expressions can have both a numeric value and a 
string value. They take on numeric or string values according to context. 
For example, in the context of an arithmetic expression like 

pop += $3 

pop and $3 must be treated numerically, so their values will be coerced to 
numeric type if necessary. 

In a string context like 

print $1 ":" $2 

$1 and $2 must be strings to be concatenated, so they will be coerced if 
necessary. 

In an assignment v = e or v op^ e, the type of v becomes the type of e. 
In an ambiguous context like 



the type of the comparison depends on whether the fields are numeric or 
string, and this can only be determined when the program runs; it may well 
differ from record to record. 

In comparisons, if both operands are numeric, the comparison is 
numeric; otherwise, operands are coerced to strings, and the comparison is 
made on the string values. All field variables are of type string; in addition, 
each field that contains only a number is also considered numeric. This 
determination is done at run time. For example, the comparison "$1 == $2" 
will succeed on any pair of the inputs 

1 1.0 +1 0.1e+1 10E-1 001 

but fail on the inputs 



$1 == $2 



(null) 
(null) 
Oa 
1e50 





0.0 


I.OeSO 
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There are two idioms for coercing an expression of one type to the 
other: 

number concatenate a null string to a number to coerce it 

to type string 

string + add zero to a string to coerce it to type numeric 

Thus, to force a string comparison between two fields, say 
$1 == $2 

The numeric value of a string is the value of any prefix of the string 
that looks numeric; thus the value of 12.34x is 12.34, while the value of 
X12.34 is zero. The string value of an arithmetic expression is computed by 
formatting the string with the output format conversion OFMT. 

Uninitialized variables have numeric value and string value 
Nonexistent fields and fields that are explicitly null have only the string 
value they are not numeric. 

Control Flow Statements 

awk provides if— else, while, do— while, and for statements, and state- 
ment grouping with braces, as in the C programming language. 

The if statement syntax is 

if (expression) statement i else statement 2 

The expression acting as the conditional has no restrictions; it can include the 
relational operators <,<=,>,>=, «=, and !=; the regular expression 
matching operators - and !- ; the logical operators 1 1, &&, and !; juxtaposi- 
tion for concatenation; and parentheses for grouping. 

In the if statement, the expression is first evaluated. If it is non-zero and 
non-null, statement i is executed; otherwise statement 2 is executed. The else 
part is optional. 

A single statement can always be replaced by a statement list enclosed 
in braces. The statements in the statement list are terminated by newlines 
or semicolons. 
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Rewriting the maximum population program from "Arithmetic Func- 
tions" with an if statement results in 




if (maxppp < $3) { 
ma^qxDp = $3 
country » $1 





{ print country, msoqxjp } 




The while statement is exactly that of the C programming language: 

while (expression) statement 

The expression is evaluated; if it is non-zero and non-null the statement is 
executed and the expression is tested again. The cycle repeats as long as the 
expression is non-zero. For example, to print all input fields one per line. 



i^le (i <= NF) { 
print $i 
i++ 

} 

} 



The for statement is like that of the C programming language: 
for (expression i; expression; expression 2) statement 
It has the same effect as 




i = 1 
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expression i 
while (expression) { 



statement 
expression 2 





SO 



{ for (i = 1; i <= NF; i++) print $i } 

does the same job as the while example above. An alternate version of the 
for statement is described in the next section. 

The do statement has the form 

do statement while (expression) 

The statement is executed repeatedly until the value of the expression 



becomes zero. Because the test takes place after the execution of the state- 
ment (at the bottom of the loop), it is always executed at least once. As a 
result, the do statement is used much less often than while or for, which 
test for completion at the top of the loop. 

The following example of a do statement prints all lines except those 
between stzart and stop. 





/start/ { 




do { 

getline x 
} \ih±le (x !- /stop/) 



{ print } 
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The break statement causes an immediate exit from an enclosing while 
or for; the continue statement causes the next iteration to begin. The next 
statement causes awk to skip immediately to the next record and begin 
matching patterns starting from the first pattern-action statement. 

The exit statement causes the program to behave as if the end of the 
input had occurred; no more input is read, and the END action, if any, is 
executed. Within the END action, 

exit expr 

causes the program to return the value of expr as its exit status. If there is 
no expr, the exit status is zero. 



Arrays 

awk provides one-dimensional arrays. Arrays and array elements need 
not be declared; like variables, they spring into existence by being men- 
tioned. An array subscript may be a number or a string. 

As an example of a conventional numeric subscript, the statement 

x[NR] = $0 

assigns the current input line to the NRth element of the array x . In fact, it 
is possible in principle (though perhaps slow) to read the entire input into 
an array with the awk program 

{ x[NR] = $0 } 
END { . . . processing . . . } 

The first action merely records each input line in the array x, indexed by 
line number; processing is done in the END statement. 

Array elements may also be named by nonnumeric values. For example, 
the following program accumulates the total population of Asia and Africa 
into the associative array pop. The END action prints the total population of 
these two continents. 
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/Asia/ 
/Africa/ 



{ pop["Asia"] += $3 } 
{ pop[ "Africa"] += $3 } 

{ print "Asian population in millicins is" , pop[ "Asia" ] 




print "African population in raillions is", 




pop[ "Africa"] } 




On the file countries, this program generates 

Asian population in itdllions is 1765 
African popalatdcn in millions is 37 

In this program if we had used pop[Asia] instead of pqp[ "Asia" ] the 
expression would have used the value of the variable Asia as the subscript 
and since the variable is uninitialized, the values would have been accumu- 
lated in pop[""] . 

Suppose our task is to determine the total area in each continent of the 
file countries. Any expression can be used as a subscript in an array refer- 
ence. Thus 

area[$4] += $2 

uses the string in the fourth field of the current input record to index the 
array area and in that entry accumulates the value of the second field: 



BE3GIN { FS = "\t" } 

{ area[$4] $2 } 
EI^ { for (name in area) 

print name, area[naroe] } 



Invoked on the file countries, this program produces 
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Africa 1888 
North America 7467 
South America 4358 
Asia 13611 
Australia 2968 

This program uses a form of the for statement that iterates over all 
defined subscripts of an array: 

for (/ in array) statement 

executes statement with the variable / set in turn to each value of i for which 
array[i] has been defined. The loop is executed once for each defined sub- 
script, which are chosen in a random order. Results are unpredictable when 
i or array is altered during the loop. 

awk does not provide multi-dimensional arrays, but it does permit a list 
of subscripts. They are combined into a single subscript with the values 
separated by an unlikely string (stored in the variable SUBSEP). For exam- 
ple, 

for (i = 1; i <= 10; i++) 

for (j = 1; j <= 10; j++) 
arr[i,j] = 

creates an array which behaves like a two-dimensional array; the subscript 
is the concatenation of i, SUBSEP, and 

You can determine whether a particular subscript / occurs in an array arr 
by testing the condition / in arr, as in 

if ("Africa" in area) 

This condition performs the test without the side effect of creating 
area[ "Africa"], which would happen if we used 

if (area[ "Africa"] != "") ... 

Note that neither is a test of whether the array area contains an element 
with value "Africa" . 



J 
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It is also possible to split any string into fields in the elements of an 
array using the built-in function split. The function 

split("s1:s2:s3", a, ":") 

splits the string s1:s2:s3 into three fields, using the separator : , and 
stores s1 in a[1], s2 in a[2], and s3 in a[3] . The number of fields 
found, here three, is returned as the value of split. The third argument of 
split is a regular expression to be used as the field separator. If the third 
argument is missing, FS is used as the field separator. 

An array element may be deleted with the delete statement: 

delete arraymm€[$ubscript] 



User-Defined Functions 

awk provides user-defined functions. A function is defined as 

function name(argument-list) { 
statements 

] 

The definition can occur anywhere a pattern-action statement can. The 
argument list is a list of variable names separated by commas; within the 
body of the function these variables refer to the actual parameters when the 
function is called. There must be no space between the function name and 
the left parenthesis of the argument list when the function is called; other- 
wise it looks like a concatenation. For example, the following program 
defines and tests the usual recursive factorial function (of course, using 
some input other than the file countries): 
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fuxiction fact(n) { 
if (n <= 1) 
return 1 

else 




return n * fact(n-l) 



{ print $1 "I is " fact($1) } 





Array arguments are passed by reference, as in C, so it is possible for 
the function to alter array elements or create new ones. Scalar arguments 
are passed by value, however, so the function cannot affect their values out- 
side. Within a function, formal parameters are local variables but all other 
variables are global. (You can have any number of extra formal parameters 
that are used purely as local variables.) The return statement is optional, 
but the returned value is undefined if it is not included. 



Some Lexical Conventions 

Comments may be placed in awk programs: they begin with the char- 
acter # and end at the end of the line, as in 

print X, y # this is a cxxnoaent 

Statements in an awk program normally occupy a single line. Several 
statements may occur on a single line if they are separated by semicolons. 
A long statement may be continued over several lines by terminating each 
continued line by a backslash. (It is not possible to continue a string.) 
This explicit continuation is rarely necessary, however, since statements 
continue automatically if the line ends with a comma (for example, as might 
occur in a print or printf statement) or after the operators && and I ! . 

Several pattern-action statements may appear on a single line if 
separated by semicolons. 
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The print and printf statements are the two primary constructs that 
generate output. The print statement is used to generate simple output; 
printf is used for more carefully formatted output. Like the shell, awk lets 
you redirect output, so that output from print and printf can be directed to 
files and pipes. This section describes the use of these two statements. 



The print Statement 

The statement 

print expri, expr2f . . . / expr„ 

prints the string value of each expression separated by the output field 
separator followed by the output record separator. The statement 

print 
is an abbreviation for 

print $0 
To print an empty line use 

print 

Output Separators 

The output field separator and record separator are held in the built-in 
variables OFS and ORS. Initially, OFS is set to a single blank and ORS to a 
single newline, but these values can be changed at any time. For example, 
the following program prints the first and second fields of each record with 
a colon between the fields and two newlines after the second field; 

BEGIN { OFS = ORS = "\n\n" } 

{ print $1, $2 } 

Notice that 

{ print $1 $2 } 

prints the first and second fields with no intervening output field separator, 
because $1 $2 is a string consisting of the concatenation of the first two 
fields. 
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The printf Statement 

awk's printf statement is the same as that in C except that the * format 
specifier is not supported. The printf statement has the general form 

printf format, expri, expr2f . . . / expr„ 

where format is a string that contains both information to be printed and 
specifications on what conversions are to be performed on the expressions 
in the argument list, as in Figure 4-8. Each specification begins with a %, 
ends with a letter that determines the conversion, and may include 

left-justify expression in its field 
width pad field to this width as needed; fields that begin 

with a leading are padded with zeros 
,prec maximum string width or digits to right of 

decimal point 



Character 


Prints Expression as 


c 


single character 


d 


decimal number 


e 


[-]d,ddddddE[+-ldd 


f 


Hddd.dddddd 


S 


e or f conversion, whichever is shorter, with 




nonsignificant zeros suppressed 


o 


unsigned octal number 


s 


string 


X 


unsigned hexadecimal number 


% 


print a %; no argument is converted 



Figure 4-8: awk printf Conversion Characters 
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Here are some examples of printf statements along with the correspond- 
ing output: 



printf 


"5«d", 99/2 


49 


printf 


"5fe", 99/2 


4.950000e-t-01 


printf 


"96f", 99/2 


49.500000 


printf 


"%S,2f*\ 99/2 


49.50 


printf 


"%g", 99/2 


49.5 


printf 


"9&>", 99 


143 


printf 


"%06o", 99 


000143 


printf 


99 


63 


printf 


"l%sl", "January" 


Ijanuary! 


printf 


"I%10sl", "January" 


1 January! 


printf 


"1%-IOs!", "January" 


Ijanuary ! 


printf 


"l%.3s!", "January" 


IJan! 


printf 


"19610, 3sl", "January" 


1 Jan! 


printf 


"l%-10.3s!". "January" 


Ijan 1 


printf 


"9©6" 


% 



The default output format of numbers is %.6g; this can be changed by 
assigning a new value to OFMT. OFMT also controls the conversion of 
numeric values to strings for concatenation and creation of array subscripts. 



Output into Files 

It is possible to print output into files instead of to the standard output 
by using the > and > > redirection operators. For example, the following 
program invoked on the file countries prints all lines where the population 
(third field) is bigger than 100 into a file called bigpop, and all other lines 
into smallpop: 

$3 > 100 { print $1, $3 >"bigpop" } 
$3 <= 100 { print $1, $3 >"smallpop" } 

Notice that the file names have to be quoted; without quotes, bigpop and 
smllpop are merely uninitialized variables. If the output file names were 
created by an expression, they would also have to be enclosed in 
parentheses: 



4-40 PROGRAMMER'S GUIDE 



Output 



$4 * /North America/ { poriat $1 > ("trap" FTLENAME) } 

This is because the > operator has higher precedence than concatenation; 
without parentheses, the concatenation of tnqp and FILENAME would not 
work. 



NOTE 



Files are opened once in an awk program. If > is used to open a file, 
its original contents are overwritten. But if > > is used to open a file, 
its contents are preserved and the output is appended to the file. Once 
the file has been opened, the two operators have the same effect. 



Output into Pipes 

It is also possible to direct printing into a pipe with a command on the 
other end, instead of into a file. The statement 

print I "command-line'' 

causes the output of print to be piped into the command-line. 

Although we have shown them here as literal strings enclosed in 
quotes, the command-line and file names can come from variables and the 
return values from functions, for instance. 

Suppose we want to create a list of continent-population pairs, sorted 
alphabetically by continent. The awk program below accumulates the 
population values in the third field for each of the distinct continent names 
in the fourth field in an array called pop. Then it prints each continent and 
its population, and pipes this output into the sort command. 

BE3GIN { FS = "\t" } 

{ pop[$4] += $3 } 
END { for (c in pop) 

print c ":" pqp[c] ! "sort" } 

Invoked on the file countries, this program yields 
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Africa: 37 
Asia: 1765 
Australia: 14 
Hkxrth America: 243 
South Aiaerica: 142 



In all of these print statements involving redirection of output, the files 
or pipes are identified by their names (that is, the pipe above is literally 
named sort ), but they are created and opened only once in the entire run. 
So, in the last example, for all c in pop, only one sort pipe is open. 

There is a limit to the number of files that can be open simultaneously. 
The statement closei file) closes a file or pipe; file is the string used to create 
it in the first place, as in 

close ("sort") 

When opening or closing a file, different strings are different commands. 
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The most common way to give input to an awk program is to name on 
the command line the file(s) that contains the input. This is the method 
we've been using in this chapter. However, there are several other methods 
we could use, each of which this section describes. 



Files and Pipes 

You can provide input to an awk program by putting the input data 
into a file, say awkdata, and then executing 

awk 'program' awkdata 

awk reads its standard input if no file names are given (see "Usage" in this 
chapter); thus, a second common arrangement is to have another program 
pipe its output into awk. For example, egrep(l) selects input lines contain- 
ing a specified regular expression, but it can do so faster than awk since this 
is the only thing it does. We could, therefore, invoke the pipe 

egrep 'Asia' countries ! awk . . ' 

egrep quickly finds the lines containing Asia and passes them on to the 
awk program for subsequent processing. 



Input Separators 

With the default setting of the field separator FS, input fields are 
separated by blanks or tabs, and leading blanks are discarded, so each of 
these lines has the same first field: 

fieldl field2 
fieldl 
fieldl 

When the field separator is a tab, however, leading blanks are not discarded. 
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The field separator can be set to any regular expression by assigning a 
value to the built-in variable FS. For example, 

BEGIN { FS = "(,[ \\t]*)l([ \\t]+)" } 

sets it to an optional comma followed by any number of blanks and tabs. 
FS can also be set on the command line with the — F argument: 

awk -FX[ \t]*)l([ \t]+K 

behaves the same as the previous example. Regular expressions used as 
field separators match the left-most longest occurrences (as in subO), but do 
not match null strings. 

Multi-line Records 

Records are normally separated by newlines, so that each line is a 
record, but this too can be changed, though only in a limited way. If the 
built-in record separator variable RS is set to the empty string, as in 

BEGIN { RS = } 

then input records can be several lines long; a sequence of empty lines 
separates records. A common way to process multiple-line records is to use 

BEGIN { RS = FS = "Nil" } 

to set the record separator to an empty line and the field separator to a new- 
line. There is a limit, however, on how long a record can be; it is usually 
about 2500 characters. "The getline Function" and "Cooperation with the 
Shell" in this chapter show other examples of processing multi-line records. 

The getline Function 

awk's facility for automatically breaking its input into records that are 
more than one line long is not adequate for some tasks. For example, if 
records are not separated by blank lines, but by something more compli- 
cated, merely setting RS to null doesn't work. In such cases, it is necessary 
to manage the splitting of each record into fields in the program. Here are 
some suggestions. 
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The function getline can be used to read input eithier from the current 
input or from a file or pipe, by redirection analogous to printf. By itself, 
getline fetches the next input record and performs the normal field-splitting 
operations on it. It sets NF, NR, and FNR. getline returns 1 if there was a 
record present, if the end-of-file was encountered, and —1 if some error 
occurred (such as failure to open a file). 

To illustrate, suppose we have input data consisting of multi-line 
records, each of which begins with a line beginning with START and ends 
with a line beginning with STOP. The following awk program processes 
these multi-line records, a line at a time, putting the lines of the record into 
consecutive entries of an array 

f[1] f[2] ... f[nf] 

Once the line containing STOP is encountered, the record can be processed 
from the data in the f array: 



/"START/ { 

f[nf=1] « $0 

vdiile (getline S& $0 I- /"STOP/) 

f [++nf ] = $0 
# now process the data in f [1]. . .f [nf ] 

} 

V 

Notice that this code uses the fact that && evaluates its operands left to 
right and stops as soon as one is true. 

The same job can also be done by the following program: 
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/"SEART/ && nf ==0 
nf > 1 
/"STOP/ 



{ f[nf=1] = $0 } 
{ f [++nf ] = $0 } 

{ # now process the data in f [1]. . .f [nf ] 




nf 



= 




The statement 
getline x 

reads the next record into the variable x. No splitting is done; NF is not 
set. The statement 

getline <"file" 

reads from file instead of the current input. It has no effect on NR or 
FNR, but field splitting is performed and NF is set. The statement 

getline x <"file" 

gets the next record from file into x; no splitting is done, and NF, NR and 
FNR are untouched. 

If a filename is an expression, it should be in parentheses for evaluation: 



\Aiile ( getline x < (ARGV[1] ARGV[23) ) { ... } 

This is because the < has precedence over concatenation. Without 
parentheses, a statement such as 

getline x < "titp" FTLENAME 

sets X to read the file titp and not tap <value of FILENAME>. Also, if you 
use this getline statement form, a statement like 

Tdiile ( getline x < file ){...} 

loops forever if the file cannot be read, because getline returns —1, not 
zero, if an error occurs. A better way to write this test is 

vdiile ( getline x < file > 0) { ... } 



NOTE 
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It is also possible to pipe the output of another command directly into 
getline. For example, the statement 

vjhile ("vjho" I getline) 
n++ 

executes v*io and pipes its output into getline. Each iteration of the vdiile 
loop reads one more line and increments the variable n, so after the vihile 
loop terminates, n contains a count of the number of users. Similarly, the 
statement 

"date" I geUine d 

pipes the output of date into the variable d, thus setting d to the current 
date. Figure 4-9 summarizes the getline function. 



Form 


Sets 


getline 


$0, NF, NR, FNR 


getline var 


var, NR, FNR 


getline <file 


$0, NF 


getline var < file 


var 


cmd 1 getline 


$0, NF 


cmd ! getline var 


var 



Figure 4-9: getline Function 



Command-line Arguments 

The command-line arguments are available to an awk program: the 
array ARGV contains the elements ARGV[01, . . . , ARGVlARGC-1]; as in C, 
ARGC is the count. ARGVIO] is the name of the program (generally awk); 
the remaining arguments are whatever was provided (excluding the pro- 
gram and any optional arguments). The following command line contains 
an awk program that echoes the arguments that appear after the program 
name: 
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BEGIN ( 

for (i « 1; i < ARGC; i++) 
printf "%s ARGV[i] 

printf 
)'$• 




The arguments may be modified or added to; ARGC may be altered. As 
each input file ends, awk treats the next non-null element of ARGV (up to 
the current value of ARGC— 1) as the name of the next input file. 

There is one exception to the rule that an argument is a file name: if it 
is of the form 

var—value 

then the variable var is set to the value value as if by assignment. Such an 
argument is not treated as a file name. If value is a string, no quotes are 
needed. 
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Using awk with Other Commands and the 
Shell 

awk gains its greatest power when it is used in conjunction with other 
programs. Here we describe some of the ways in which awk programs 
cooperate with other commands. 



The system Function 

The built-in function systemicommand-line) executes the command 
command-line, which may well be a string computed by, for example, the 
built-in function sprintf . The value returned by system is the return status 
of the command executed. 

For example, the program 

$1 == "#diiclude" { gsub(/[<>"]/, $2); system("cat " $2) } 

calls the command cat to print the file named in the second field of every 
input record whose first field is #include, after stripping any <, > or " that 
might be present. 



Cooperation with the Shell 

In all the examples thus far, the awk program was in a file and fetched 
from there using the -f flag, or it appeared on the command line enclosed 
in single quotes, as in 

awk '{ print $1 }' . , . 

Since awk uses many of the same characters as the shell does, such as $ and 
", surrounding the awk program with single quotes ensures that the shell 
will pass the entire program unchanged to the awk interpreter. 

Now, consider writing a command addr that will search a file 
addresslist for name, address and telephone information. Suppose that 
addresslist contains names and addresses in which a typical entry is a 
multi-line record such as 
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G. R. Bmlin 

600 Mountain Avenue 

Murray Hill, NJ 07974 

201-555-1234 

Records are separated by a single blank line. 

We want to search the address list by issuing commands like 
addr Emlin 
That is easily done by a program of the form 

BEGIN { PS = } 

/Slnlin/ 

^ addresslist 

The problem is how to get a different search pattern into the program each 
time it is run. 

There are several ways to do this. One way is to create a file called addr 
that contains 

av^ * 

BEGIN { PS = } 

/'$1V 

' addresslist 

The quotes are critical here: the awk program is only one argument, even 
though there are two sets of quotes, because quotes do not nest. The $1 is 
outside the quotes, visible to the shell, which therefore replaces it by the 
pattern anlin when the command addr Emlin is invoked. On a UNIX sys- 
tem, addr can be made executable by changing its mode with the following 
command: chmod +x addr. 

A second way to implement addr relies on the fact that the shell substi- 
tutes for $ parameters within double quotes: 

avflc " 

BEGIN { PS = \"\" } 

/$1/ 

" addresslist 

Here we must protect the quotes defining PS with backslashes, so that the 
shell passes them on to awk, uninterpreted by the shell. $1 is recognized 
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as a parameter, however, so the shell replaces it by the pattern when the 
command addr pattern is invoked. 

A third way to implement addr is to use ARGV to pass the regular 
expression to an awk program that explicitly reads through the address list 
with getline: 




BEGIN { RS = 

vdiile (getline < "addresslist" ) 
if ($0 - ARGV[1]) 
print $0 

} ' $* 




All processing is done in the BEGIN action. 

Notice that any regular expression can be passed to addr; in particular, 
it is possible to retrieve by parts of an address or telephone number as well 
as by name. 
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awk has been used in surprising ways. We have seen awk programs 
that implement database systems and a variety of compilers and assemblers, 
in addition to the more traditional tasks of information retrieval, data mani- 
pulation, and report generation. Invariably, the awk programs are 
significantly shorter than equivalent programs written in more conventional 
programming languages such as Pascal or C. In this section, we will 
present a few more examples to illustrate some additional awk programs. 

Generating Reports 

awk is especially useful for producing reports that summarize and for- 
mat information. Suppose we wish to produce a report from the file coun- 
tries in which we list the continents alphabetically, and after each continent 
its countries in decreasing order of population: 




Africa: 




Sudan 
Algeria 



19 
18 



Asia: 



China 
India 
USSR 



866 
637 
262 



Australia: 



Australia 



14 



North America: 



USA 
Canada 



219 
24 



South America: 



Brazil 
Argentina 



116 
26 
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As with many data processing tasks, it is much easier to produce this 
report in several stages. First, we create a list of continent-country- 
population triples, in which each field is separated by a colon. This can be 
done with the following program triples, which uses an array pop indexed 
by subscripts of the form 'continent:country' to store the population of a 
given country. The print statement in the END section of the program 
creates the list of continent-country-population triples that are piped to the 
sort routine. 

BEGIN { FS = "\t" } 

{ pop[$4 ":" $1] += $3 } 
END { for (cc in pop) 

print cc ":" pcp[cc] I "sort -t: +0 -1 +2nr" } 

The arguments for sort deserve special mention. The H:: argument tells 
sort to use : as its field separator. The +0 -1 arguments make the first 
field the primary sort key. In general, +i -/ makes fields , i+2 , . . . , y the 
sort key. If — / is omitted, the fields from /H-2 to the end of the record are 
used. The +2nr argument makes the third field, numerically decreasing, the 
secondary sort key (n is for numeric, r for reverse order). Invoked on the 
file countries/ this program produces as output 



Africa: Sudan: 19 
Africa : Algeria : 1 8 
Asia: China: 866 
Asia: India: 637 
Asia:USSR:262 
Australia: Australia: 14 
North Anerica:USA:219 
North America: Canada: 24 
South America : Brazil : 1 16 
South America: Argentina: 26 
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This output is in the right order but the wrong format. To transform 
the output into the desired form we run it through a second awk program 
format: 



BB3IN { FS = } 



This is a control-break program that prints only the first occurrence of a 
continent name and formats the country-population lines associated with 
that continent in the desired manner. The command line 

awk — £ triples countries I awk — f format 

gives us our desired report. As this example suggests, complex data 
transformation and formatting tasks can often be reduced to a few simple 
awks and sorts. 

As an exercise, add to the population report subtotals for each continent 
and a grand total. 

Additional Examples 

Word Frequencies 

Our first example illustrates associative arrays for counting. Suppose we 
want to count the number of times each word appears in the input, where a 
word is any contiguous sequence of non-blank, non-tab characters. The fol- 
lowing program prints the word frequencies, sorted in decreasing order. 

{ for (w = 1; w <= NF; w++) cxxint[$w]++ } 
END { for (w in count) print coant[w], w i "sort -nr" } 

The first statement uses the array count to accumulate the number of times 
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if ($1 != prev) { 

print "\n" $1 ":" 
prev = $1 



printf "\tS6-10s 5«6dSii", $2, $3 




Example Applications 

each word is used. Once the input has been read, the second for loop 
pipes the final count along with each word into the sort command. 

Accumulation 

Suppose we have two files, deposits and withdrawals, of records con- 
taining a name field and an amount field. For each name we want to print 
the net balance determined by subtracting the total withdrawals from the 
total deposits for each name. The net balance can be computed by the fol- 
lowing program: 



f 

FILENAME == "deposits" { balance[$1] +« $2 > 
FILENAME == "withdrawals" { balanceC$1] -= $2 } 
END { for (name in balance) 

print name, balance [name] 

} ' deposits withdrawals 

V 

The first statement uses the array balance to accumulate the total amount 
for each name in the file deposits. The second statement subtracts associ- 
ated withdrawals from each total. If there are only withdrawals associated 
with a name, an entry for that name will be created by the second state- 
ment. The END action prints each name with its net balance. 

Random Choice 

The following function prints (in order) k random elements from the 
first n elements of the array A» In the program, k is the number of entries 
that still need to be printed, and n is the number of elements yet to be 
examined. The decision of whether to print the ith element is determined 
by the test rand( ) < k/n. 
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function chDose(A, k, n) ( 

for (i = 1; n > 0; i++) 
if (randO < )c/n— ) 
print A[i] 



Shell Facility 

The following awk program simulates (crudely) the history facility of 
the UNIX system shell. A line containing only = re-executes the last com- 
mand executed. A line beginning with = cmd re-executes the last command 
whose invocation included the string cmd. Otherwise^ the current line is 
executed. 



$1 



/./ 



{ if (NF == 1) 

systan{x[NR] = x[NR-1]) 

else 

for (i = NR-1; i > 0; i— ) 
if {x[i3 - $2) { 

S3^stem(x[NR] = x[i]} 
break 

} 

next } 

{ &ystein(x[NR] s $0) } 
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Form-letter Generation 

The following program generates form letters, using a template stored 
in a file called fom. letter: 

This is a farm letter. 

Hie first field is $1, the seooind $2, the third $3. 
■The third is $3, second is $2, and first is $1, 

and replacement text of this form: 

field Hfield 2! field 3 

one I twD I three 

albic 

The BEGIN action stores the template in the array template; the remaining 
action cycles through the input data, using gsub to replace template fields 
of the form $n with the corresponding data fields. 



BE3GIN { FS = "I" 

\^le (getlijie <"form. letter") 
liiie[++n] = $0 

} 

{ for (i = 1; i <= n; i++) { 

s = line[i] 
for (j = 1; j <= NF; 

gsub("\\$"j, $j, s) 

print s 

> 



In all such examples, a prudent strategy is to start with a small version 
and expand it, trying out each aspect before moving on to the next 
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Command Line 

awk program filenames 

awk — f program-file filenames 

awk — Fs sets field separator to string s; —Ft sets separator to tab 



Patterns 

BEGIN 
EM) 

/regular expression/ 
relational expression 
pattern && pattern 
pattern ! 1 pattern 
(pattern) 
{pattern 

pattern, pattern 



Control Flow Statements 

if (expr) statement [else statement] 

if (subscript in array) statement [else statement] 

while (expr) statement 

for (expr; expr; expr) statement 

for (var in array) statement 

do statement while (expr) 

break 

continue 

next 

exit [expr] 
return [expr] 
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Input-output 



closeifilename) 
getline 
getline <file 
getline var 
getline var < file 
print 

print expr-list 
print expr-list > file 
printf fmt^ expr-list 
printf fmt, expr-list >file 
system{cmd-line) 



close file 

set $0 from next input record; set NF, NR, FNR 

set $0 from next record of file; set NF 

set var from next input record; set NR, FNR 

set var from next record of file 

print current record 

print expressions 

print expressions on file 

format and print 

format and print on file 

execute command cmd-line, return status 



In print and printf above, >>file appends to the file, and { command 
writes on a pipe. Similarly, command I getline pipes into getline. getline 
returns on end of file, and -1 on error. 



Functions 

func nameiparameter list) [ statement } 
function name(parameter list) { statement } 
function-nameiexpr, expr, . . . ) 
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String Functions 



gsub(r,s,0 



index(s^ t ) 

length(s ) 
match(5^ r ) 

split(s, a, r ) 



sprintf(/mf, expr-list) 
sub(r, $, t ) 
substr(5, n ) 



substitute string s for each substring matching 
regular expression r in string t, return number 
of substitutions; if t omitted, use $0 
return index of string t in string s, or if not 
present 

return length of string s 

return position in s where regular expression r 
occurs, or if r is not present 
split string s into array a on regular expression 
r, return number of fields; if r omitted, FS is 
used in its place 

print expr-list according to fmt, return result- 
ing string 

like gsub except only the first matching sub- 
string is replaced 

return n-char substring of s starting at /; if n 
omitted, use rest of $ 



Arithmetic Functions 



atan2(]/,x) 


arctangent of y/x in radians 


cos{expr) 


cosine (angle in radians) 


expiexpr) 


exponential 


ini{expr) 


truncate to integer 


logiexpr) 


natural logarithm 


randO 


random number between and 1 


sin{expr) 


sine (angle in radians) 


sqrt(expr) 


square root 


srand{expr) 


new seed for random number generator; 




use time of day if no expr 



4-60 PROGRAMMER'S GUIDE 



awk Summary 



Operators (Increasing Precedence) 

= += -= *= /= %= ^= assignment 

?: conditional expression 

i I logical OR 

fi& logical AND 

• ! " regular expression match, negated match 
<<=>>= 1= == relationals 

blank string concatenation 

+ — add, subtract 

* / % multiply, divide, mod 

+ - 1 unary plus, unary minus, logical negation 

exponentiation (♦* is a synonym) 

++ — increment, decrement (prefix and postfix) 

$ field 



Regular Expressions (Increasing Precedence) 



c matches non-metacharacter c 

\c matches literal character c 

matches any character but newline 
matches beginning of line or string 

$ matches end of line or string 

[abc,..] character class matches any of abc,„ 

l^abc...] negated class matches any but abc,.. and newline 

rl I r2 matches either rl or r2 

rlrl concatenation: matches rl, then rl 

r+ matches one or more /s 

r* matches zero or more r's 

r? matches zero or one r's 

(r) grouping: matches r 
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Built-in Variables 



ARGC number of command-line arguments 

ARGV array of command-line arguments (0..ARGC-1) 

FILENAME name of current input file 

FNR input record number in current file 

FS input field separator (default blank) 

NF number of fields in current input record 

NR input record number since beginning 

OFMT output format for numbers (default %.6g) 

OFS output field separator (default blank) 

ORS output record separator (default newline) 

RS input record separator (default newline) 

RSTART index of first character matched by matchO; if no match 

RLENGTH length of string matched by matchO; -1 if no match 

SUBSEP separates multiple subscripts in array elements; default A034" 



Limits 

Any particular implementation of awk enforces some limits. Here are 
typical values: 

100 fields 

2500 characters per input record 
2500 characters per output record 
1024 characters per individual field 
1024 characters per printf string 
400 characters maximum quoted string 
400 characters in character class 
15 open files 
1 pipe 

numbers are limited to what can be represented on the local 
machine, e.g., le-38..1e+38 
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Initialization, Comparison, and Type Coercion 

Each variable and field can potentially be a string or a number or both 
at any time. When a variable is set by the assignment 

var = expr 

its type is set to that of the expression. (Assignment includes +=, — etc.) 
An arithmetic expression is of type number, a concatenation is of type 
string, and so on. If the assignment is a simple copy, as in 

v1 = v2 

then the type of v1 becomes that of v2. 

In comparisons, if both operands are numeric, the comparison is made 
numerically. Otherwise, operands are coerced to string if necessary, and the 
comparison is made on strings. The type of any expression can be coerced 
to numeric by subterfuges such as 

expr + 

and to string by 

expr 

(that is, concatenation with a null string). 

Uninitialized variables have the numeric value and the string value 
Accordingly, if x is uninitialized, 

if (x) ... 

is false, and 

if (!x) ... 

if (x == 0) ... 

if (x == "") ... 

are all true. But the following is false: 

if (X == "0") ... 
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The type of a field is determined by context when possible; for example, 
$1++ 

clearly implies that $1 is to be numeric, and 
$1 = $1 $2 

implies that $1 and $2 are both to be strings. Coercion is done as needed. 
In contexts where types cannot be reliably determined, for example, 
if ($1 == $2) ... 

the type of each field is determined on input. All fields are strings; in addi- 
tion, each field that contains only a number is also considered numeric. 

Fields that are explicitly null have the string value ; they are not 
numeric. Non-existent fields (i.e., fields past NF) are treated this way, too. 

As it is for fields, so it is for array elements created by splitO. 

Mentioning a variable in an expression causes it to exist, with the value 
as described above. Thus, if axr[i] does not currently exist, 

if (arr[i] == "") 

causes it to exist with the value so the if is satisfied. The special construc- 
tion 

if (i in arr) • • . 

determines if arr[i] exists without the side effect of creating it if it does 
not. 
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An Overview o1f lex Programming 



lex is a software tool that lets you solve a wide class of problems drawn 
from text processing, code enciphering, compiler writing, and other areas. 
In text processing, you may check the spelling of words for errors; in code 
enciphering, you may translate certain patterns of characters into others; 
and in compiler writing, you may determine what the tokens (smallest 
meaningful sequences of characters) are in the program to be compiled. 
The problem common to all of these tasks is recognizing different strings of 
characters that satisfy certain characteristics. In the compiler writing case, 
creating the ability to solve the problem requires implementing the 
compiler's lexical analyzer. Hence the name lex. 

It is not essential to use lex to handle problems of this kind. You could 
write programs in a standard language like C to handle them, too. In fact, 
what lex does is produce such C programs, (lex is therefore called a pro- 
gram generator.) What lex offers you, once you acquire a facility with it, is 
typically a faster, easier way to create programs that perform these tasks. Its 
weakness is that it often produces C programs that are longer than neces- 
sary for the task at hand and that execute more slowly than they otherwise 
might. In many applications this is a minor consideration, and the advan- 
tages of using lex considerably outweigh it. 

To understand what lex does, see the diagram in Figure 5-1. We begin 
with the lex source (often called the lex specification) that you, the pro- 
grammer, write to solve the problem at hand. This lex source consists of a 
list of rules specifying sequences of characters (expressions) to be searched 
for in an input text, and the actions to take when an expression is found. 
The source is read by the lex program generator. The output of the pro- 
gram generator is a C program that, in turn, must be compiled by a host 
language C compiler to generate the executable object program that does the 
lexical analysis. Note that this procedure is not typically automatic — user 
intervention is required. Finally, the lexical analyzer program produced by 
this process takes as input any source file and produces the desired output, 
such as altered text or a list of tokens. 
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lex can also be used to collect statistical data on features of the input, 
such as character count word length, number of occurrences of a word, and 
so forth. In later sections of this chapter, we will see 

■ how to write lex source to do some of these tasks 

■ how to translate lex source 

■ how to compile, link, and execute the lexical analyzer in C 

■ how to run the lexical analyzer program 

We will then be on our way to appreciating the power that lex provides. 



lex 

Analyzer 
in C 



lex 

Source 



lex 



C 

Compiler 



Input 
Text 



lex 

^l* Analyzer 
Program 



Output: 
Tokens, 
Text, etc. 



Figure 5-1: Creation and Use of a Lexical Analyzer with lex 
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A lex specification consists of at most three sections: definitions, rules, 
and user subroutines. The rules section is mandatory. Sections for 
definitions and user subroutines are optional, but if present, must appear in 
the indicated order. 



The Fundamentals of lex Rules 

The mandatory rules section opens with the delimiter %%. If a subrou- 
tines section follows, another %% delimiter ends the rules section. If there 
is no second delimiter, the rules section is presumed to continue to the end 
of the program. 

Each rule consists of a specification of the pattern sought and the 
action(s) to take on finding it. (Note the dual meaning of the term 
specification — it may mean either the entire lex source itself or, within it, a 
representation of a particular pattern to be recognized.) Whenever the 
input consists of patterns not sought, lex writes out the input exactly as it 
finds it. So, the simplest lex program is just the beginning rules delimiter, 
%%. It writes out the entire input to the output with no changes at all. 
Typically, the rules are more elaborate than that. 

Specifications 

You specify the patterns you are interested in with a notation called 
regular expressions. A regular expression is formed by stringing together 
characters with or without operators. The simplest regular expressions are 
strings of text characters with no operators at all. For example, 

apple 

orange 

pluto 

These three regular expressions match any occurrences of those character 
strings in an input text. If you want to have your lexical analyzer a.out 
remove every occurrence of orange, from the input text, you could specify 
the rule 

orange; 
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Because you did not specify an action on the right (before the semi- 
colon), lex does nothing but print out the original input text with every 
occurrence of this regular expression removed, that is, without any 
occurrence of the string orange at all. 

Unlike orange above, most of the expressions that we want to search for 
cannot be specified so easily. The expression itself might simply be too 
long. More commonly, the class of desired expressions is too large; it may, 
in fact, be infinite. Thanks to the use of operators, we can form regular 
expressions signifying any expression of a certain class. The + operator, for 
instance, means one or more occurrences of the preceding expression, the ? 
means or 1 occurrence(s) of the preceding expression (this is equivalent, 
of course, to saying that the preceding expression is optional), and ♦ means 
or more occurrences of the preceding expression. (It may at first seem odd 
to speak of occurrences of an expression and to need an operator to cap- 
ture the idea, but it is often quite helpful. We will see an example in a 
moment.) So m+ is a regular expression matching any string of ms such as 
each of the following: 

ILUUl 

m 

iiiiiiiiiii 
ititi 

and 7* is a regular expression matching any string of zero or more 7s: 
77 

77777 
777 

The string of blanks on the third line matches simply because it has no 7s 
in it at all. 

Brackets, [ ], indicate any one character from the string of characters 
specified between the brackets. Thus, [dgka] matches a single d, g, k, or a. 
Note that commas are not included within the brackets. Any comma here 
would be taken as a character to be recognized in the input text. Ranges 
within a standard alphabetic or numeric order are indicated with a hyphen, 
— . The sequence [a-z], for instance, indicates any lowercase letter. Some- 
what more interestingly, 

[A-Za-zO-9^t£#] 
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is a regular expression that matches any letter (whether upper- or lower- 
case), any digit, an asterisk, an ampersand, or a sharp character. Given the 
input text 

$$$$?? $$$$$$6.+====r— # (( 

the lexical analyzer with the previous specification in one of its rules will 
recognize the r, and #, perform on each recognition whatever action 
the rule specifies (we have not indicated an action here), and print out the 
rest of the text as it stands. 

The operators become especially powerful in combination. For example, 
the regular expression to recognize an identifier in many programming 
languages is 

[a-zA-Z] [0-9a-zA-Z]* 

An identifier in these languages is defined to be a letter followed by 
zero or more letters or digits, and that is just what the regular expression 
says. The first pair of brackets matches any letter. The second, if it were 
not followed by a *, would match any digit or letter. The two pairs of 
brackets with their enclosed characters would then match any letter fol- 
lowed by a digit or a letter. But with the asterisk, the example matches 
any letter followed by any number of letters or digits. In particular, it 
would recognize the following as identifiers: 

e 

pay 

distance 
pH 

Engin^99 
H2D2 

Note that it would not recognize the following as identifiers: 

not_idenTIFE3l 

Stimes 

$hello 

because notJdenTIFER has an embedded underscore; Stimes starts with a 
digit, not a letter; and $hello starts with a special character. Of course, you 
may want to write the specifications for these three examples as an exercise. 
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A potential problem with operator characters is how we can refer to 
them as characters to look for in our search pattern. The last example, for 
instance, will not recognize text with an * in it. lex solves the problem in 
one of two ways: a character enclosed in quotation marks or a character pre- 
ceded by a \ is taken literally, that is, as part of the text to be searched for. 
To use the backslash method to recognize, say, an • followed by any 
number of digits, we can use the pattern 

\*[1-9]* 

To recognize a \ itself, we need two backslashes: \\. 
Actions 

Once lex recognizes a string matching the regular expression at the start 
of a rule, it looks to the right of the rule for the action to be performed. 
Kinds of actions include recording the token type found and its value, if 
any; replacing one token with another; and counting the number of 
instances of a token or token type. What you want to do is write these 
actions as program fragments in the host language C. An action may con- 
sist of as many statements as are needed for the job at hand. You may want 
to print out a message noting that the text has been found or a message 
transforming the text in some way. Thus, to recognize the expression 
Amelia Earhart and to note such recognition, the rule 

"Amelia Earhart" printf { "found Amelia" ) ; 

would do. And to replace in a text lengthy medical terms with their 
equivalent acronyms, a rule such as 

Eleclxoencephalogram printf ( "KEG" ) ; 

would be called for. To count the lines in a text, we need to recognize 
end-of-lines and increment a linecounter. lex uses the standard escape 
sequences from C like \n for end-of-line. To count lines we might have 

\n lineaio++; 

where lineno, like other C variables, is declared in the definitions section 
that we discuss later. 

lex stores every character string that it recognizes in a character array 
called yytext[ ]. You can print or manipulate the contents of this array as 
you want. Sometimes your action may consist of two or more C statements 
and you must (or for style and clarity, you choose to) write it on several 
lines. To inform lex that the action is for one rule only, simply enclose the 
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C code in braces. For example, to count the total number of all digit strings 
in an input text, print the running total of the number of digit strings (not 
their sum, here) and print out each one as soon as it is found, your lex code 
might be 



This specification matches digit strings whether they are preceded by a plus 
sign or not, because the ? indicates that the preceding plus sign is optional. 
In addition, it will catch negative digit strings because that portion follow- 
ing the minus sign, — , will match the specification. The next section 
explains how to distinguish negative from positive integers. 



Advanced lex Usage 

lex provides a suite of features that lets you process input text riddled 
with quite complicated patterns. These include rules that decide what 
specification is relevant, when more than one seems so at first; functions 
that transform one matching pattern into another; and the use of definitions 
and subroutines. Before considering these features, you may want to affirm 
your understanding thus far by examining an example drawing together 
several of the points already covered. 



+?[1-9]+ 



{ digstmgoount++; 



printf ( "56d" ,digstmgoount) ; 
printf("%s", yytext); } 





-[0-9]+ 

+?[0-9]+ 

-0.[0-9]+ 

rail[ 1-i-roetd 

crooik 

function 

G[a-zA-Z]* 



printf ("negative integer"); 
printf ( "positive integer"); 

printf ("negative fraction, no whole nutriber part"); 
printf ("railroad is one vrord"); 
printf ("Here's a crodk"); 
subprogoo u nt-t-t- ; 

{ printf ("may have a G word here: yytext); 



Gstringccwnt++; } 
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The first three rules recognize negative integers, positive integers, and 
negative fractions between and -1. The use of the terminating + in each 
specification ensures that one or more digits compose the number in ques- 
tion. Each of the next three rules recognizes a specific pattern. The 
specification for railroad matches cases where one or more blanks intervene 
between the two syllables of the word. In the cases of railroad and crook, 
you may have simply printed a synonym rather than the messages stated. 
The rule recognizing a function simply increments a counter. The last rule 
illustrates several points: 

■ The braces specify an action sequence extending over several lines. 

■ Its action uses the lex array yytext[ ], which stores the recognized 
character string. 

■ Its specification uses the • to indicate that zero or more letters may 
follow the G. 

Some Special Features 

Besides storing the recognized character string in yytext[], lex automati- 
cally counts the number of characters in a match and stores it in the vari- 
able yyleng. You may use this variable to refer to any specific character just 
placed in the array yytext[ ]. Remember that C numbers locations in an 
array starting with 0, so to print out the third digit (if there is one) in a just 
recognized integer, you might write 

[1-9]+ {if (yyleng > 2) 

printf("%c", yytext[2]); } 

lex follows a number of high-level rules to resolve ambiguities that may 
arise from the set of rules that you write. Prima facie, any reserved word, 
for instance, could match two rules. In the lexical analyzer example 
developed later in the section on lex and yacc, the reserved word end could 
match the second rule as well as the seventh, the one for identifiers. 

lex follows the rule that where there is a match with two or more rules in 
a specification, the first rule is the one whose action will be executed. 



NOTE 



By placing the rule for end and the other reserved words before the rule for 
identifiers, we ensure that our reserved words will be duly recognized. 
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Another potential problem arises from cases where one pattern you are 
searching for is the prefix of another. For instance, the last two rules in the 
lexical analyzer example above are designed to recognize > and > = . If 
the text has the string >= at one point, you might worry that the lexical 
analyzer would stop as soon as it recognized the > character to execute the 
rule for > rather than read the next character and execute the rule for > =. 



NOTE 



lex follows the rule that it matches the longest character string possi- 
ble and executes the rule for that. 



Here it would recognize the >= and act accordingly. As a further example, 
the rule would enable you to distinguish + from ++ in a program in C. 

Still another potential problem exists when the analyzer must read char- 
acters beyond the string you are seeking because you cannot be sure you've 
in fact found it until you've read the additional characters. These cases 
reveal the importance of trailing context. The classic example here is the 
DO statement in FORTRAN. In the statement 

DO 50 k = 1 , 20, 1 

we cannot be sure that the first 1 is the initial value of the index k until we 
read the first comma. Until then, we might have the assignment statement 

DO50k = 1 

(Remember that FORTRAN ignores all blanks.) The way to handle this is to 
use the forward-looking slash, / (not the backslash, \), which signifies that 
what follows is trailing context, something not to be stored in yytext[ ], 
because it is not part of the token itself. So the rule to recognize the FOR- 
TRAN DO statement could be 

DO/C ]*[0-9][ ]*[a-z A-Z0-9]+=[a-z A-Z0-9]+, printf( "found DO"); 

Different versions of FORTRAN have limits on the size of identifiers, here 
the index name. To simplify the example, the rule accepts an index name 
of any length. 

lex uses the $ as an operator to mark a special trailing context — the end 
of line. (It is therefore equivalent to \n.) An example would be a rule to 
ignore all blanks and tabs at the end of a line: 

[ \t]+$ ; 
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On the other hand, if you want to match a pattern only when it starts a 
line, lex offers you the circumflex, as the operator. The formatter nrofif, 
for example, demands that you never start a line with a blank, so you might 
want to check input to nroff with some such rule as: 

"^l ] ponntf ("error: remove leading blank" ) ; 

Finally, some of your action statements themselves may require your 
reading another character, putting one back to be read again a moment 
later, or writing a character on an output device, lex supplies three func- 
tions to handle these tasks— input( ), unput(c), and output(c), respectively. 
One way to ignore all characters between two special characters, say 
between a pair of double quotation marks, would be to use input( ), thus: 

\" vMle (ii^t( ) 1= '"'); 

Upon finding the first double quotation mark, the generated a.out will sim- 
ply continue reading all subsequent characters so long as none is a quota- 
tion mark, and not again look for a match until it finds a second double 
quotation mark. 

To handle special I/O needs, such as writing to several files, you may 
use standard I/O routines in C to rewrite the functions input(), unput(c), 
and output. These and other programmer-defined functions should be 
placed in your subroutine section. Your new routines will then replace the 
standard ones. The standard input( ), in fact, is equivalent to getchar( ), and 
the standard output(c) is equivalent to putchar(c). 

There are a number of lex routines that let you handle sequences of 
characters to be processed in more than one way. These include yymore( ), 
yyless(n), and REJECT. Recall that the text matching a given specification 
is stored in the array yytext[ ]. In general, once the action is performed for 
the specification, the characters in yytext[ ] are overwritten with succeeding 
characters in the input stream to form the next match. The function 
yymore( ), by contrast, ensures that the succeeding characters recognized are 
appended to those already in yytext[ ]. This lets you do one thing and then 
another, when one string of characters is significant and a longer one 
including the first is significant as well. Consider a character string bound 
by Bs and interspersed with one at an arbitrary location. 
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In a simple code deciphering situation, you may want to count the 
number of characters between the first and second B's and add it to the 
number of characters between the second and third B, (Only the last B is 
not to be counted.) The code to do this is 




save = yyleng; 
flag = 1; 
yynnre( ); 



else { 

importantJiD = save + yyleng; 
flag = 0; 



} 

) 




where flag, save, and importantno are declared (and at least flag initialized 
to 0) in the definitions section. The flag distinguishes the character 
sequence terminating just before the second B from that terminating just 
before the third. 

The function yyless(M) lets you reset the end point of the string to be 
considered to the nth character in the original yytext[ ]. Suppose you are 
again in the code deciphering business and the gimmick here is to work 
with only half the characters in a sequence ending with a certain one, say 
upper- or lowercase Z, The code you want might be 

[a-yA-Y]+[Z2] { yyless(yyleng/2) ; 

. , • process first half of string, . • } 

Finally, the function REJECT lets you more easily process strings of 
characters even when they overlap or contain one another as parts. REJECT 
does this by immediately jumping to the next rule and its specification 
without changing the contents of yytextt ]. If you want to count the 
number of occurrences both of the regular expression snapdragon and of its 
subexpression dragon in an input text, the following will do: 
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snapdragon 
dragon 



{countflowers++; KEJBCT;} 
oountinansters++ ; 



As an example of one pattern overlapping another, the following counts 
the number of occurrences of the expressions comedian and diana, even 
where the input text has sequences such as comediana..: 



Note that the actions here may be considerably more complicated than 
simply incrementing a counter. In all cases, the counters and other neces- 
sary variables are declared in the definitions section commencing the lex 
specification. 

Definitions 

The lex definitions section may contain any of several classes of items. 
The most critical are external definitions, #include statements, and abbrevi- 
ations. Recall that for legal lex source this section is optional, but in most 
cases some of these items are necessary. External definitions have the form 
and function that they do in C. They declare that variables globally defined 
elsewhere (perhaps in another source file) will be accessed in your lex- 
generated a*out. Consider a declaration from an example to be developed 
later. 

extern int txikval; 

When you store an integer value in a variable declared in this way, it 
will be accessible in the routine, say a parser, that calls it. If, on the other 
hand, you want to define a local variable for use within the action sequence 
of one rule (as you might for the index variable for a loop), you can declare 
the variable at the start of the action itself right after the left brace, { . 

The purpose of the #include statement is the same as in C: to include 
files of importance for your program. Some variable declarations and lex 
definitions might be needed in more than one lex source file. It is then 
advantageous to place them all in one file to be included in every file that 
needs them. One example occurs in using lex with yacc, which generates 
parsers that call a lexical analyzer. In this context, you should include the 
file y.tab.h, which may contain #defines for token names. Like the declara- 
tions, #include statements should come between %{ and }%, thus: 
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{ocinicoount++; REJECT;} 
p(rinoessoount++ ; 
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%{ 

#iiiclude "y^tab.h" 
eictem int tdlcval; 
int loneno; 
%} 



In the definitions section, after the %} that ends your #include's and 
declarations, you place your abbreviations for regular expressions to be used 
in the rules section. The abbreviation appears on the left of the line and, 
separated by one or more spaces, its definition or translation appears on the 
right. When you later use abbreviations in your rules, be sure to enclose 
them within braces. 



NOTE 



The purpose of abbreviations is to avoid needless repetition in writing 
your specifications and to provide clarity in reading them. 



As an example, reconsider the lex source reviewed at the beginning of 
this section on advanced lex usage. The use of definitions simplifies our 
later reference to digits, letters, and blanks. This is especially true if the 
specifications appear several times: 
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D [0-9] 

L [a-zA-Z] 

B [ ] 

-{D}+ printf ("negative integer"); 

+7 {D} + printf ( "positive integer" ) ; 

-0.{D}+ printf ("negative fraction"); 

6{L}» printf ("may have a G ward here"); 

rail{B}+road printf ("railroad is one vjord"); 

crook. printf ( "criminal" ) ; 

\"\./{B}+ printf ("A""); 




The last rule, newly added to the example and somewhat more complex 
than the others, is used in the WRITER'S WORKBENCH Software, an AT&T 
software product for promoting good writing. (See the UNIX System 
WRITER'S WORKBENCH Software Release 3,0 User's Guide for information on 
this product.) The rule ensures that a period always precedes a quotation 
mark at the end of a sentence. It would change example", to example." 

Subroutines 

You may want to use subroutines in lex for much the same reason that 
you do so in other programming languages. Action code that is to be used 
for several rules can be written once and called when needed. As with 
definitions, this can simplify the writing and reading of programs. The 
function put_in_tabl( ), to be discussed in the next section on lex and yacc, 
is a good candidate for a subroutine. 

Another reason to place a routine in this section is to highlight some 
code of interest or to simplify the rules section, even if the code is to be 
used for one rule only. As an example, consider the following routine to 
ignore comments in a language like C where comments occur between /• 
and */ : 
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/* rest of rules »/ 

sild.pc[nnts( ) 
{ 

for( ; ; ) 
{ 

*rtiile (input( ) 1= '♦'); 
if (i2^t( ) != '/') { 

ui^t(30^text[yyleng-1 ] ) ; 
else return; 



} 

} 




There are three points of interest in this example. First, the unput(c) 
function (putting back the last character read) is necessary to avoid missing 
the final / if the comment ends unusually with a *♦/. In this case, eventu- 
ally having read an the analyzer finds that the next character is not the 
terminal / and must read some more. Second, the expression 
yytext[yyleng— 1] picks out that last character read. Third, this routine 
assumes that the comments are not nested. (This is indeed the case with the 
C language.) If, unlike C, they are nested in the source text, after 
input()ing the first ♦/ ending the inner group of comments, the a.out will 
read the rest of the comments as if they were part of the input to be 
searched for patterns. 

Other examples of subroutines would be programmer-defined versions 
of the I/O routines input(), unput(c), and output(), discussed above. Sub- 
routines such as these that may be exploited by many different programs 
would probably do best to be stored in their own individual file or library 
to be called as needed. The appropriate #include statements would then be 
necessary in the definitions section. 
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Using lex with yacc 

If you work on a compiler project or develop a program to check the 
validity of an input language, you may want to use the UNIX system pro- 
gram tool yacc. yacc generates parsers, programs that analyze input to 
ensure that it is syntactically correct, (yacc is discussed in detail in Chapter 
6 of this guide.) lex often forms a fruitful union with yacc in the compiler 
development context. Whether or not you plan to use lex with yacc, be 
sure to read this section because it covers information of interest to all lex 
programmers. 

The lexical analyzer that lex generates (not the file that stores it) takes 
the name yylex( ). This name is convenient because yacc calls its lexical 
analyzer by this very name. To use lex to create the lexical analyzer for the 
parser of a compiler, you want to end each lex action with the statement 
return token, where token is a defined term whose value is an integer. The 
integer value of the token returned indicates to the parser what the lexical 
analyzer has found. The parser, whose file is called y.tab.c by yacc, then 
resumes control and makes another call to the lexical analyzer when it 
needs another token. 

In a compiler, the different values of the token indicate what, if any, 
reserved word of the language has been found or whether an identifier, 
constant, arithmetic operand, or relational operator has been found. In the 
latter cases, the analyzer must also specify the exact value of the token: 
what the identifier is, whether the constant, say, is 9 or 888, whether the 
operand is + or * (multiply), and whether the relational operator is = or >. 
Consider the following portion of lex source for a lexica' inalyzer for some 
programming language perhaps slightly reminiscent of i .a: 
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beg±n retum{BE]GIN) ; 

end retvim(EHD); 
while return ( WHILE ) ; 

if retum(IF); 
package return{ PACKAGE) ; 

reverse retum{REVERSE) ; 

loop retum(IOOP) ; 

[a-zA-Z][a-zA-Z0-9]* { tokval « put_in_tabl( ); 

retum{ IDENTIFIER) ; } 
[0-9]+ { tokval = pat_in_tabl{ ); 

retumdMIBGER) ; } 
\+ { tokval = PLUS; 

retum(ARrnK)P) ; } 
\- { tcjkval = MINUS; 

retum(ARnHOP) ; } 
> { tokval = tS^TER; 

retum(REIX)P) ; } 
>= { tcikval = Q^TERBQL; 

retum(RELX)P); ) 



Despite appearances, the tokens returned, and the values assigned to 
tokval, are indeed integers. Good programming style dictates that we use 
informative terms such as BEGIN, END, WHILE, and so forth to signify the 
integers the parser understands, rather than use the integers themselves. 
You establish the association by using #define statements in your parser 
calling routine in C. For example. 
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If the need arises to change the integer for some token type, you then 
change the #de£ine statement in the parser rather than hunt through the 
entire program, changing every occurrence of the particular integer. In 
using yacc to generate your parser, it is helpful to insert the statement 

#incl\ide y.tab.h 

into the definitions section of your lex source. The file y.tab.h provides 
#define statements that associate token names such as BEGIN, END, and so 
on with the integers of significance to the generated parser. 

To indicate the reserved words in the example, the returned integer 
values suffice. For the other token types, the integer value of the token 
type is stored in the programmer-defined variable tokval. This variable, 
whose definition was an example in the definitions section, is globally 
defined so that the parser as well as the lexical analyzer can access it. yacc 
provides the variable yylval for the same purpose. 

Note that the example shows two ways to assign a value to tokval. 
First, a function put_in_tabl( ) places the name and type of the identifier or 
constant in a symbol table so that the compiler can refer to it in this or a 
later stage of the compilation process. More to the present point, 
put_in_tabl( ) assigns a type value to tokval so that the parser can use the 
information immediately to determine the syntactic correctness of the input 
text. The function put_in_tabl( ) would be a routine that the compiler 
writer might place in the subroutines section discussed later. Second, in the 
last few actions of the example, tokval is assigned a specific integer indicat- 
ing which operand or relational operator the analyzer recognized. If the 
variable PLUS, for instance, is associated with the integer 7 by means of the 
#define statement above, then when a + sign is recognized, the action 
assigns to tokval the value 7, which indicates the +. The analyzer indicates 
the general class of operator by the value it returns to the parser (in the 
example, the integer signified by ARITHOP or RELOP). 
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As you review the following few steps, you might recall Figure 5-1 at 
the start of the chapter. To produce the lexical analyzer in C, run 

lex lex.l 

where lex.l is the file containing your lex specification. The name lex.l is 
conventionally the favorite, but you may use whatever name you want. 
The output file that lex produces is automatically called lex.yy.c; this is the 
lexical analyzer program that you created with lex. You then compile and 
link this as you would any C program, making sure that you invoke the lex 
library with the —11 option: 

cc lex.yy.c —II 

The lex library provides a default main( ) program that calls the lexical 
analyzer under the name yylex( ), so you need not supply your own main( ). 

If you have the lex specification spread across several files, you can run 
lex with each of them individually, but be sure to rename or move each 
lex.yy.c file (with mv) before you run lex on the next one. Otherwise, each 
will overwrite the previous one. Once you have all the generated .c files, 
you can compile all of them, of course, in one command line. 

With the executable a.out produced, you are ready to analyze any 
desired input text. Suppose that the text is stored under the filename textin 
(this name is also arbitrary). The lexical analyzer a.out by default takes 
input from your terminal. To have it take the file textin as input, simply 
use redirection, thus: 

a.out < textin 

By default, output will appear on your terminal, but you can redirect this as 
well: 

a.out < textin > textout 

In running lex with yacc, either may be run first. 

yacc — d grammar.y 
lex lex.l 

spawns a parser in the file y.tab.c. (The -d option creates the file y.tab.h, 
which contains the #define statements that associate the yacc assigned 
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integer token values with the user-defined token names.) To compile and 
link the output files produced, run 

cc lex,yy.c y.tab.c — ly —11 

Note that the yacc library is loaded (with the — ly option) before the lex 
library (with the —11 option) to ensure that the main( ) program supplied 
will call the yacc parser. 

There are several options available with the lex command. If you use 
one or more of them, place them between the command name lex and the 
filename argument. If you care to see the C program, lex.yy.c, that lex gen- 
erates on your terminal (the default output device), use the — t option. 

lex — t lex.l 

The — v option prints out for you a small set of statistics describing the 
so-called finite automata that lex produces with the C program lex*yy.c. 
(For a detailed account of finite automata and their importance for lex, see 
the Aho, Sethi, and UUman text. Compilers: Principles, Techniques, and Tools, 
Addison-Wesley, 1986.) 

lex uses a table (a two-dimensional array in C) to represent its finite 
automaton. The maximum number of states that the finite automaton 
requires is set by default to 500. If your lex source has a large number of 
rules or the rules are very complex, this default value may be too small. 
You can enlarge the value by placing another entry in the definitions sec- 
tion of your lex source, as follows: 

%ci 700 

This entry tells lex to make the table large enough to handle as many as 
700 states. (The — v option will indicate how large a number you should 
choose.) If you have need to increase the maximum number of state transi- 
tions beyond 2000, the designated parameter is a, thus: 

%a 2800 

Finally, check the Programmer's Reference Manual page on lex for a list of 
all the options available with the lex command. 
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This tutorial has introduced you to lex programming. As with any pro- 
gramming language, the way to master it is to write programs and then 
write some more. 
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yacc provides a general tool for imposing structure on the input to a 
computer program. The yacc user prepares a specification that includes: 

■ a set of rules to describe the elements of the input 

■ code to be invoked when a rule is recognized 

■ either a definition or declaration of a low-level routine to examine 
the input 

yacc then turns the specification into a C language function that exam- 
ines the input stream. This function, called a parser, works by calling the 
low-level input scanner. The low-level input scanner, called a lexical 
analyzer, picks up items from the input stream. The selected items are 
known as tokens. Tokens are compared to the input construct rules, called 
grammar rules. When one of the rules is recognized, the user code supplied 
for this rule, (an action) is invoked. Actions are fragments of C language 
code. They can return values and make use of values returned by other 
actions. 

The heart of the yacc specification is the collection of grammar rules. 
Each rule describes a construct and gives it a name. For example, one gram- 
mar rule might be 

date : month nane day ' , ' year ; 

where date, month_name, day, and year represent constructs of interest; 
presumably, month_name, day, and year are defined in greater detail else- 
where. In the example, the comma is enclosed in single quotes. This 
means that the comma is to appear literally in the input. The colon and 
semicolon merely serve as punctuation in the rule and have no significance 
in evaluating the input. With proper definitions, the input 

July 4, 1776 

might be matched by the rule. 

The lexical analyzer is an important part of the parsing function. This 
user-supplied routine reads the input stream, recognizes the lower-level 
constructs, and communicates these as tokens to the parser. The lexical 
analyzer recognizes constructs of the input stream as terminal symbols; the 
parser recognizes constructs as nonterminal symbols. To avoid confusion, 
we will refer to terminal symbols as tokens. 
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There is considerable leeway in deciding whether to recognize con- 
structs using the lexical analyzer or grammar rules. For example, the rules 




might be used in the above example. While the lexical analyzer only needs 
to recognize individual letters, such low-level rules tend to waste time and 
space, and may complicate the specification beyond the ability of yacc to 
deal with it. Usually, the lexical analyzer recognizes the month names and 
returns an indication that a month_name is seen. In this case, month_name 
is a token and the detailed rules are not needed. 

Literal characters such as a comma must also be passed through the lexi- 
cal analyzer and are also considered tokens. 

Specification files are very flexible. It is relatively easy to add to the 
above example the rule 

date : montii V day V 3^ear ; 
allowing 

7/4/1776 
as a synonym for 

July 4, 1776 

on input. In most cases, this new rule could be slipped into a working sys- 
tem with minimal effort and little danger of disrupting existing input. 
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The input being read may not conform to the specifications. With a 
left-to-right scan input errors are detected as early as is theoretically possi- 
ble. Thus, not only is the chance of reading and computing with bad input 
data substantially reduced, but the bad data usually can be found quickly. 
Error handling, provided as part of the input specifications, permits the 
reentry of bad data or the continuation of the input process after skipping 
over the bad data. 

In some cases, yacc fails to produce a parser when given a set of 
specifications. For example, the specifications may be self-contradictory, or 
they may require a more powerful recognition mechanism than that avail- 
able to yacc. The former cases represent design errors; the latter cases often 
can be corrected by making the lexical analyzer more powerful or by rewrit- 
ing some of the grammar rules. While yacc cannot handle all possible 
specifications, its power compares favorably with similar systems. More- 
over, the constructs that are difficult for yacc to handle are also frequently 
difficult for human beings to handle. Some users have reported that the 
discipline of formulating valid yacc specifications for their input revealed 
errors of conception or design early in the program development. 

The remainder of this chapter describes the following subjects: 

■ basic process of preparing a yacc specification 

■ parser operation 

■ handling ambiguities 

■ handling operator precedences in arithmetic expressions 

■ error detection and recovery 

■ the operating environment and special features of the parsers yacc 
produces 

■ suggestions to improve the style and efficiency of the specifications 

■ advanced topics 

In addition, there are two examples and a summary of the yacc input 
syntax. 
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Names refer to either tokens or nonterminal symbols, yacc requires 
token names to be declared as such. While the lexical analyzer may be 
included as part of the specification file, it is perhaps more in keeping with 
modular design to keep it as a separate file. Like the lexical analyzer, other 
subroutines may be included as well. Thus, every specification file theoreti- 
cally consists of three sections: the declarations, (grammar) rules, and sub- 
routines. The sections are separated by double percent signs, % % (the per- 
cent sign is generally used in yacc specifications as an escape character). 

A full specification file looks like: 

declarations 
%% 
rules 
%% 

subroutines 

when all sections are used. The declarations and subroutines sections are 
optional. The smallest legal yacc specification is 

%% 
rules 

Blanks, tabs, and newlines are ignored, but they may not appear in 
names or multicharacter reserved symbols. Comments may appear wher- 
ever a name is legal. They are enclosed in /• ... */, as in the C language. 

The rules section is made up of one or more grammar rules. A grammar 
rule has the form 

A : BODY ; 

where A represents a nonterminal symbol, and BODY represents a sequence 
of zero or more names and literals. The colon and the semicolon are yacc 
punctuation. 

Names may be of any length and may be made up of letters, dots, 
underscores, and digits although a digit may not be the first character of a 
name. Uppercase and lowercase letters are distinct. The names used in the 
body of a grammar rule may represent tokens or nonterminal symbols. 
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A literal consists of a character enclosed in single quotes, As in the C 
language, the backslash, V is an escape character within literals, and all the 
C language escapes are recognized. Thus: 



'\n' 


newline 


'\r' 


return 


'\" 


single quote ( ' ) 


'W 


backslash ( \ ) 


'\t' 


tab 


'\b' 


backspace 


'\f' 


form feed 


\xxx 


XXX in octal notation 



are understood by yacc. For a number of technical reasons, the NULL char- 
acter (\0 or 0) should never be used in grammar rules. 

If there are several grammar rules with the same left-hand side, the 
vertical bar, | , can be used to avoid rewriting the left-hand side. In addi- 
tion, the semicolon at the end of a rule is dropped before a vertical bar. 
Thus the grammar rules 

A : B C D ; 
A : E F ; 
A : G ; 

can be given to yacc as 

A : B C D 
! E F 
I G 
> 

by using the vertical bar. It is not necessary that all grammar rules with the 
same left side appear together in the grammar rules section although it 
makes the input more readable and easier to change. 

If a nonterminal symbol matches the empty string, this can be indicated 

by 

epsilon : ; 

The blank space following the colon is understood by yacc to be a nonter- 
minal symbol named epsilon. 
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Names representing tokens must be declared. This is most simply done 
by writing 

%token name1 ]ianie2 . . . 

in the declarations section. Every name not defined in the declarations sec- 
tion is assumed to represent a nonterminal symbol. Every nonterminal 
symbol must appear on the left side of at least one rule. 

Of all the nonterminal symbols, the start symbol has particular impor- 
tance. By default, the start symbol is taken to be the left-hand side of the 
first grammar rule in the rules section. It is possible and desirable to 
declare the start symbol explicitly in the declarations section using the 
%start keyword. 

%start symbol 

The end of the input to the parser is signaled by a special token, called 
the end-marker. The end-marker is represented by either a zero or a nega- 
tive number. If the tokens up to but not including the end-marker form a 
construct that matches the start symbol, the parser function returns to its 
caller after the end-marker is seen and accepts the input. If the end-marker 
is seen in any other context, it is an error. 

It is the job of the user-supplied lexical analyzer to return the end- 
marker when appropriate. Usually the end-marker represents some reason- 
ably obvious I/O status, such as end of file or end of record. 



Actions 

With each grammar rule, the user may associate actions to be performed 
when the rule is recognized. Actions may return values and may obtain the 
values returned by previous actions. Moreover, the lexical analyzer can 
return values for tokens if desired. 

An action is an arbitrary C language statement and as such can do input 
and output, call subroutines, and alter arrays and variables. An action is 
specified by one or more statements enclosed in curly braces, {, and ). For 
example: 
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are grammar rules with actions. 

The dollar sign symbol, is used to facilitate communication between 
the actions and the parser. The pseudo-variable $$ represents the value 
returned by the complete action. For example, the action 

{ $$ = 1; } 

returns the value of one; in fact, that's all it does. 

To obtain the values returned by previous actions and the lexical 
analyzer, the action may use the pseudo-variables $1, $2, ... $n. These refer 
to the values returned by components 1 through n of the right side of a 
rule, with the components being numbered from left to right. If the rule is 

A : B C D ; 
then $2 has the value returned by C, and $3 the value returned by D. 
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The rule 

esxpr : expr ; 

provides a common example. One would expect the value returned by this 
rule to be the value of the expr within the parentheses. Since the first com- 
ponent of the action is the literal left parenthesis, the desired logical result 
can be indicated by 

By default, the value of a rule is the value of the first element in it ($1). 
Thus, grammar rules of the form 

A : B ; 

frequently need not have an explicit action. In previous examples, all the 
actions came at the end of rules. Sometimes, it is desirable to get control 
before a rule is fully parsed, yacc permits an action to be written in the 
middle of a rule as well as at the end. This action is assumed to return a 
value accessible through the usual $ mechanism by the actions to the right 
of it. In turn, it may access the values returned by the symbols to its left. 
Thus, in the rule below the effect is to set x to 1 and y to the value returned 
by C. 




{ 

$$ = 1; 

} 
c 

{ 

X = $2; 
y = «3; 

} 
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Actions that do not terminate a rule are handled by yacc by manufactur- 
ing a new nonterminal symbol name and a new rule matching this name to 
the empty string. The interior action is the action triggered by recognizing 
this added rule, yacc treats the above example as if it had been written 




where $ACT is an empty action. 

In many applications, output is not done directly by the actions. A data 
structure, such as a parse tree, is constructed in memory and transformations 
are applied to it before output is generated. Parse trees are particularly easy 
to construct given routines to build and maintain the tree structure desired. 
For example, suppose there is a C function node written so that the call 

node( L, n1, n2 ) 

creates a node with label L and descendants nl and n2 and returns the 
index of the newly created node. Then a parse tree can be built by supply- 
ing actions such as 

expr : expr esxpr 
{ 

$$ = node( $1, $3 ); 

} 

in the specification. 
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The user may define other variables to be used by the actions. Declara- 
tions and definitions can appear in the declarations section enclosed in the 
marks %( and %}. These declarations and definitions have global scope, so 
they are known to the action statements and can be made known to the lex- 
ical analyzer. For example: 

%{ int variable = 0; %} 

could be placed in the declarations section making variable accessible to all 
of the actions. Users should avoid names beginning with yy because the 
yacc parser uses only such names. In the examples shown thus far all the 
values are integers. A discussion of values of other types is found in the 
section "Advanced Topics." 



Lexical Analysis 

The user must supply a lexical analyzer to read the input stream and 
communicate tokens (with values, if desired) to the parser. The lexical 
analyzer is an integer-valued function called yylex. The function returns an 
integer, the token number, representing the kind of token read. If there is a 
value associated with that token, it should be assigned to the external vari- 
able yylval. 

The parser and the lexical analyzer must agree on these token numbers 
in order for communication between them to take place. The numbers may 
be chosen by yacc or the user. In either case, the #de£ine mechanism of C 
language is used to allow the lexical analyzer to return these numbers sym- 
bolically. For example, suppose that the token name DIGIT has been 
defined in the declarations section of the yacc specification file. The 
relevant portion of the lexical analyzer might look like 
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int yylex( ) 




extern int yyival; 
int c; 

c = getchar( ) ; 

svd.tch (c) 
{ 

case '0': 
case '1': 

case '9': 
yyival = c - '0'; 
return (DIGIT); 



to return the appropriate token. 

The intent is to return a token number of DIGIT and a value equal to 
the numerical value of the digit. Provided that the lexical analyzer code is 
placed in the subroutines section of the specification file, the identifier 
DIGIT is defined as the token number associated with the token DIGIT. 

This mechanism leads to clear, easily modified lexical analyzers. The 
only pitfall to avoid is using any token names in the grammar that are 
reserved or significant in C language or the parser. For example, the use of 
token names if or while will almost certainly cause severe difficulties when 
the lexical analyzer is compiled. The token name error is reserved for error 
handling and should not be used naively. 

In the default situation, token numbers are chosen by yacc. The default 
token number for a literal character is the numerical value of the character 
in the local character set. Other names are assigned token numbers starting 
at 257. If the yacc command is invoked with the -d option a file called 
y.tab.h is generated, y.tab.h contains #define statements for the tokens. 
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If the user prefers to assign the token numbers, the first appearance of 
the token name or literal in the declarations section must be followed 
immediately by a nonnegative integer. This integer is taken to be the token 
number of the name or literal. Names and literals not defined this way are 
assigned default definitions by yacc. The potential for duplication exists 
here. Care must be taken to make sure that all token numbers are distinct. 

For historical reasons, the end-marker must have token number or 
negative. This token number cannot be redefined by the user. Thus, all 
lexical analyzers should be prepared to return or a negative number as a 
token upon reaching the end of their input. 

A very useful tool for constructing lexical analyzers is the lex utility. 
Lexical analyzers produced by lex are designed to work in close harmony 
with yacc parsers. The specifications for these lexical analyzers use regular 
expressions instead of grammar rules, lex can be easily used to produce 
quite complicated lexical analyzers, but there remain some languages (such 
as FORTRAN), which do not fit any theoretical framework and whose lexi- 
cal analyzers must be crafted by hand. 
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yacc turns the specification file into a C language procedure, which 
parses the input according to the specification given. The algorithm used to 
go from the specification to the parser is complex and will not be discussed 
here. The parser itself, though, is relatively simple and understanding its 
usage will make treatment of error recovery and ambiguities easier. 

The parser produced by yacc consists of a finite state machine with a 
stack. The parser is also capable of reading and remembering the next 
input token (called the look-ahead token). The current state is always the 
one on the top of the stack. The states of the finite state machine are given 
small integer labels. Initially, the machine is in state (the stack contains 
only state 0) and no look-ahead token has been read. 

The machine has only four actions available — shift, reduce, accept, and 
error. A step of the parser is done as follows: 

1 . Based on its current state, the parser decides if it needs a look-ahead 
token to choose the action to be taken. If it needs one and does not 
have one, it calls yylex to obtain the next token. 

2. Using the current state and the look-ahead token if needed, the 
parser decides on its next action and carries it out. This may result 
in states being pushed onto the stack or popped off of the stack and 
in the look-ahead token being processed or left alone. 

The shift action is the most common action the parser takes. Whenever 
a shift action is taken, there is always a look-ahead token. For example, in 
state 56 there may be an action 

IF shift 34 

which says, in state 56, if the look-ahead token is IF, the current state (56) is 
pushed down on the stack, and state 34 becomes the current state (on the 
top of the stack). The look-ahead token is cleared. 

The reduce action keeps the stack from growing without bounds, 
reduce actions are appropriate when the parser has seen the right-hand side 
of a grammar rule and is prepared to announce that it has seen an instance 
of the rule replacing the right-hand side by the left-hand side. It may be 
necessary to consult the look-ahead token to decide whether or not to 
reduce (usually it is not necessary). In fact, the default action (represented 
by a dot) is often a reduce action. 
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reduce actions are associated with individual grammar rules. Grammar 
rules are also given small integer numbers, and this leads to some confu- 
sion. The action 

redace 18 

refers to grammar rule 18, while the action 

IP shift 34 

refers to state 34. 

Suppose the rule 

A : X y z ; 

is being reduced. The reduce action depends on the left-hand symbol (A in 
this case) and the number of symbols on the right-hand side (three in this 
case). To reduce, first pop off the top three states from the stack. (In gen- 
eral, the number of states popped equals the number of symbols on the 
right side of the rule.) In effect, these states were the ones put on the stack 
while recognizing x, y, and z and no longer serve any useful purpose. 
After popping these states, a state is uncovered, which was the state the 
parser was in before beginning to process the rule. Using this uncovered 
state and the symbol on the left side of the rule, perform what is in effect a 
shift of A. A new state is obtained, pushed onto the stack, and parsing con- 
tinues. There are significant differences between the processing of the left- 
hand symbol and an ordinary shift of a token, however, so this action is 
called a goto action. In particular, the look-ahead token is cleared by a shift 
but is not affected by a goto. In any case, the uncovered state contains an 
entry such as 

A goto 20 

causing state 20 to be pushed onto the stack and become the current state. 

In effect, the reduce action turns back the clock in the parse popping 
the states off the stack to go back to the state where the right-hand side of 
the rule was first seen. The parser then behaves as if it had seen the left 
side at that time. If the right-hand side of the rule is empty, no states are 
popped off of the stacks. The uncovered state is in fact the current state. 

The reduce action is also important in the treatment of user-supplied 
actions and values. When a rule is reduced, the code supplied with the rule 
is executed before the stack is adjusted. In addition to the stack holding the 
states, another stack running in parallel with it holds the values returned 
from the lexical analyzer and the actions. When a shift takes place, the 
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external variable yylval is copied onto the value stack. After the return 
from the user code, the reduction is carried out. When the goto action is 
done, the external variable yyval is copied onto the value stack. The 
pseudo-variables $1, $2, etc., refer to the value stack. 

The other two parser actions are conceptually much simpler. The accept 
action indicates that the entire input has been seen and that it matches the 
specification. This action appears only when the look-ahead token is the 
end-marker and indicates that the parser has successfully done its job. The 
error action, on the other hand, represents a place where the parser can no 
longer continue parsing according to the specification. The input tokens it 
has seen (together with the look-ahead token) cannot be followed by any- 
thing that would result in a legal input. The parser reports an error and 
attempts to recover the situation and resume parsing. The error recovery (as 
opposed to the detection of error) will be discussed later. 

Consider: 




as a yacc specification. 

When yacc is invoked with the -v option, a file called y.output is pro- 
duced with a human-readable description of the parser. The y.output file 
corresponding to the above grammar (with some statistics stripped off the 
end) follows. 
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state 

Saccept : jctr/aie Send 

DING shift 3 
• erroir 

rhyme goto 1 
sound goto 2 

state 1 

$accept : rhymeSend 

Send accept 
. error 

state 2 

rhyne : sound_place 

DELL shift 5 
error 

place goto 4 

state 3 

sound : DINGDCNG 

IXm shift 6 
, error 

state 4 

rh^flne : sound place_ (1) 
reduce 1 

state 5 

place : npr.r. (3) 
reduce 3 

state 6 

sound ; DING DONG_ (2) 
reduce 2 
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The actions for each state are specified and there is a description of the pars- 
ing rules being processed in each state. The _ character is used to indicate 
what has been seen and what is yet to come in each rule. The following 
input 

DING JXm DELL 

can be used to track the operations of the parser. Initially, the current state 
is state 0. The parser needs to refer to the input in order to decide between 
the actions available in state 0, so the first token, DING, is read and becomes 
the look-ahead token. The action in state on DING is shift 3, state 3 is 
pushed onto the stack, and the look-ahead token is cleared. State 3 becomes 
the current state. The next token, DONG, is read and becomes the look- 
ahead token. The action in state 3 on the token DONG is shift 6, state 6 is 
pushed onto the stack, and the look-ahead is cleared. The stack now con- 
tains 0, 3, and 6. In state 6, without even consulting the look-ahead, the 
parser reduces by 

sound : DING XXXG 

which is rule 2. Two states, 6 and 3, are popped off of the stack uncovering 
state 0, Consulting the description of state (looking for a goto on sound), 

sound goto 2 

is obtained. State 2 is pushed onto the stack and becomes the current state. 

In state 2, the next token, DELL, must be read. The action is shift 5, so 
state 5 is pushed onto the stack, which now has 0, 2, and 5 on it, and the 
look-ahead token is cleared. In state 5, the only action is to reduce by rule 
3. This has one symbol on the right-hand side, so one state, 5, is popped 
off, and state 2 is uncovered. The goto in state 2 on place (the left side of 
rule 3) is state 4. Now, the stack contains 0, 2, and 4. In state 4, the only 
action is to reduce by rule 1. There are two symbols on the right, so the top 
two states are popped off, uncovering state again. In state 0, there is a 
goto on rhyme causing the parser to enter state 1. In state 1, the input is 
read and the end-marker is obtained indicated by $end in the y.output file. 
The action in state 1 (when the end-marker is seen) successfully ends the 
parse. 

The reader is urged to consider how the parser works when confronted 
with such incorrect strings as DING DONG DONG, DING DONG, DING 
DONG DELL DELL, etc. A few minutes spent with this and other simple 
examples is repaid when problems arise in more complicated contexts. 
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A set of grammar rules is ambiguous if there is some input string that 
can be structured in two or more different ways. For example, the grammar 
rule 

expr : expr ' e3q3r 

is a natural way of expressing the fact that one way of forming an arith- 
metic expression is to put two other expressions together with a minus sign 
between them. Unfortunately, this grammar rule does not completely 
specify the way that all complex inputs should be structured. For example, 
if the input is 

expc — expr — expr 

the rule allows this input to be structured as either 

( expr - expr ) - ejqpr 

or as 

expr - ( expr — expr ) 

(The first is called left association, the second right association.) 

yacc detects such ambiguities when it is attempting to build the parser. 
Given the input 

expr - expr - expr 

consider the problem that confronts the parser. When the parser has read 
the second expr, the input seen 

expr — expr 

matches the right side of the grammar rule above. The parser could reduce 
the input by applying this rule. After applying the rule, the input is 
reduced to expr (the left side of the rule). The parser would then read the 
final part of the input 

— expr 

and again reduce. The effect of this is to take the left associative interpreta- 
tion. 
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Alternatively, if the parser sees 
expr — expr 

it could defer the immediate application of the rule and continue reading 
the input until 

expr — expr — expr 

is seen. It could then apply the rule to the rightmost three symbols reduc- 
ing them to expr, which results in 

expr — expr 

being left. Now the rule can be reduced once more. The effect is to take 
the right associative interpretation. Thus, having read 

expr — e^qpo: 

the parser can do one of two legal things, a shift or a reduction. It has no 
way of deciding between them. This is called a shift-reduce conflict. It 
may also happen that the parser has a choice of two legal reductions. This 
is called a reduce-reduce conflict. Note that there are never any shift-shift 
conflicts. 

When there are shift-reduce or reduce-reduce conflicts, yacc still pro- 
duces a parser. It does this by selecting one of the valid steps wherever it 
has a choice. A rule describing the choice to make in a given situation is 
called a disambiguating rule. 

yacc invokes two default disambiguating rules: 

1 . In a shift-reduce conflict, the default is to do the shift. 

2. In a reduce-reduce conflict, the default is to reduce by the earlier 
grammar rule (in the yacc specification). 

Rule 1 implies that reductions are deferred in favor of shifts when there 
is a choice. Rule 2 gives the user rather crude control over the behavior of 
the parser in this situation, but reduce-reduce conflicts should be avoided 
when possible. 

Conflicts may arise because of mistakes in input or logic or because the 
grammar rules (while consistent) require a more complex parser than yacc 
can construct. The use of actions within rules can also cause conflicts if the 
action must be done before the parser can be sure which rule is being 
recognized. In these cases, the application of disambiguating rules is inap- 
propriate and leads to an incorrect parser. For this reason, yacc always 
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reports the number of shift-reduce and reduce-reduce conflicts resolved by 
Rule 1 and Rule 2. 

In general, whenever it is possible to apply disambiguating rules to pro- 
duce a correct parser, it is also possible to rewrite the grammar rules so that 
the same inputs are read but there are no conflicts. For this reason, most 
previous parser generators have considered conflicts to be fatal errors. Our 
experience has suggested that this rewriting is somewhat unnatural and 
produces slower parsers. Thus, yacc will produce parsers even in the pres- 
ence of conflicts. 

As an example of the power of disambiguating rules, consider 

Stat : IF oand stat 

IP ootnd Stat ELSE stat 

which is a fragment from a programming language involving an if-then- 
else statement. In these rules, IF and ELSE are tokens, cond is a nonterminal 
symbol describing conditional (logical) expressions, and stat is a nontermi- 
nal symbol describing statements. The first rule will be called the simple if 
rule and the second the if -else rule. 

These two rules form an ambiguous construction because input of the 
form 

IF ( C1 ) IF ( C2 ) SI ELSE S2 
can be structured according to these rules in two ways 
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IF ( C2 ) 
S1 



ELSE 

S2 




where the second interpretation is the one given in most programming 
languages having this construct; each ELSE is associated with the last 
preceding un-ELSE'd IF. In this example, consider the situation where the 
parser has seen 

IF ( C1 ) IF ( C2 ) S1 

and is looking at the ELSE. It can immediately reduce by the simple if rule 
to get 

IF ( C1 ) Stat 
and then read the remaining input 

ELSE 82 
and reduce 

IF ( C1 ) Stat ELSE S2 

by the if-else rule. This leads to the first of the above groupings of the 
input. 

On the other hand, the ELSE may be shifted, S2 read, and then the 
right-hand portion of 

IF ( C1 ) IF ( C2 ) S1 ELSE S2 
can be reduced by the if-else rule to get 
IF ( C1 ) Stat 
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which can be reduced by the simple if rule. This leads to the second of the 
above groupings of the input which is usually desired. 

Once again, the parser can do two valid things — there is a shift-reduce 
conflict. The application of disambiguating rule 1 tells the parser to shift in 
this case, which leads to the desired grouping. 

This shift-reduce conflict arises only when there is a particular current 
input symbol, ELSE, and particular inputs, such as 

IF ( C1 ) IF ( C2 ) S1 

have already been seen. In general, there may be many conflicts, and each 
one will be associated with an input symbol and a set of previously read 
inputs. The previously read inputs are characterized by the state of the 
parser. 

The conflict messages of yacc are best understood by examining the ver- 
bose (— v) option output file. For example, the output corresponding to the 
above conflict state might be 



23: shift-reduce (Xsnflict {shift 45, reduce 18) on ELSE 
state 23 

Stat : IF ( cond ) stat_ (18) 
Stat : IF ( oand ) stat_ELSE stat 

ELSE shift 45 
reduce 18 



where the first line describes the conflict— giving the state and the input 
symbol. The ordinary state description gives the grammar rules active in 
the state and the parser actions. Recall that the underline marks the portion 
of the grammar rules, which has been seen. Thus in the example, in state 
23 the parser has seen input corresponding to 

IF ( oo(nd ) Stat 
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and the two grammar rules shown are active at this time. The parser can do 
two possible things. If the input symbol is ELSE, it is possible to shift into 
state 45. State 45 will have, as part of its description, the line 

Stat : IF ( cxand ) stat ELSE_stat 

because the ELSE will have been shifted in this state. In state 23, the alter- 
native action (describing a dot, .)/ is to be done if the input symbol is not 
mentioned explicitly in the actions. In this case, if the input symbol is not 
ELSE, the parser reduces to 

Stat : IF cond stat 

by grammar rule 18. 

Once again, notice that the numbers following shift commands refer to 
other states, while the numbers following reduce commands refer to gram- 
mar rule numbers. In the y.output file, the rule numbers are printed in 
parentheses after those rules, which can be reduced. In most states, there is 
a reduce action possible in the state and this is the default command. The 
user who encounters unexpected shift-reduce conflicts will probably want 
to look at the verbose output to decide whether the default actions are 
appropriate. 
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There is one common situation where the rules given above for resolv- 
ing conflicts are not sufficient. This is in the parsing of arithmetic expres- 
sions. Most of the commonly used constructions for arithmetic expressions 
can be naturally described by the notion of precedence levels for operators, 
together with information about left or right associativity. It turns out that 
ambiguous grammars with appropriate disambiguating rules can be used to 
create parsers that are faster and easier to write than parsers constructed 
from unambiguous grammars. The basic notion is to write grammar rules of 
the form 

expr : expr OP expr 

and 

expr : UNAGY expr 

for all binary and unary operators desired. This creates a very ambiguous 
grammar with many parsing conflicts. As disambiguating rules, the user 
specifies the precedence or binding strength of all the operators and the 
associativity of the binary operators. This information is sufficient to allow 
yacc to resolve the parsing conflicts in accordance with these rules and con- 
struct a parser that realizes the desired precedences and associativities. 

The precedences and associativities are attached to tokens in the declara- 
tions section. This is done by a series of lines beginning with a yacc key- 
word: %lef t, %right or %nonassoC/ followed by a list of tokens. All of the 
tokens on the same line are assumed to have the same precedence level and 
associativity; the lines are listed in order of increasing precedence or bind- 
ing strength. Thus: 

%left 

%left V 

describes the precedence and associativity of the four arithmetic operators. 
Plus and minus are left associative and have lower precedence than star and 
slash, which are also left associative. The keyword %right is used to 
describe right associative operators, and the keyword %nonassoc is used to 
describe operators, like the operator .LT. in FORTRAN, that may not associ- 
ate with themselves. Thus: 

A .LT. B .LT. C 
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is illegal in FORTRAN and such an operator would be described with the 
keyword %nonassoc in yacc. As an example of the behavior of these 
declarations, the description 




might be used to structure the input 

a = b = c*d - e - f*g 
as follows 

a = ( b = ( ((c*d)-e) - (fg) ) ) 

in order to perform the correct precedence of operators. When this mechan- 
ism is used, unary operators must, in general, be given a precedence. Some- 
times a unary operator and a binary operator have the same symbolic 
representation but different precedences. An example is unary and binary 
minus, -. 

Unary minus may be given the same strength as multiplication, or even 
higher, while binary minus has a lower strength than multiplication. The 
keyword, %prec, changes the precedence level associated with a particular 
grammar rule. The keyword %prec appears immediately after the body of 
the grammar rule, before the action or closing semicolon, and is followed 
by a token name or literal. It causes the precedence of the grammar rule to 
become that of the following token name or literal. For example, the rules 
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%left 
%left 




expr 



expr 
expr 
expr 
expr 



expr 
expr 
expr 
expr 



expr 



%prec 



NAME 





might be used to give unary minus the same precedence as multiplication, 

A token declared by %lef t, %right, and %nonassoc need not be, but may 
be, declared by %token as well. 

Precedences and associativities are used by yacc to resolve parsing 
conflicts. They give rise to the following disambiguating rules: 

1 . Precedences and associativities are recorded for those tokens and 
literals that have them. 

2. A precedence and associativity is associated with each grammar rule. 
It is the precedence and associativity of the last token or literal in 
the body of the rule. If the %prec construction is used, it overrides 
this default. Some grammar rules may have no precedence and 
associativity associated with them. 

3. When there is a reduce-reduce conflict or there is a shift-reduce 
conflict and either the input symbol or the grammar rule has no 
precedence and associativity, then the two default disambiguating 
rules given at the beginning of the section are used, and the 
conflicts are reported. 
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4. If there is a shift-reduce conflict and both the grammar rule and the 
input character have precedence and associativity associated with 
them, then the conflict is resolved in favor of the action— shift or 
reduce — associated with the higher precedence. If precedences are 
equal, then associativity is used. Left associative implies reduce; 
right associative implies shift; nonassociating implies error. 

Conflicts resolved by precedence are not counted in the number of 
shift-reduce and reduce-reduce conflicts reported by yacc. This means that 
mistakes in the specification of precedences may disguise errors in the input 
grammar. It is a good idea to be sparing with precedences and use them in 
a cookbook fashion until some experience has been gained. The y.output 
file is very useful in deciding whether the parser is actually doing what was 
intended. 
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Error handling is an extremely difficult area, and many of the problems 
are semantic ones. When an error is found, for example, it may be neces- 
sary to reclaim parse tree storage, delete or alter symbol table entries, 
and /or, typically, set switches to avoid generating any further output. 

It is seldom acceptable to stop all processing when an error is found. It 
is more useful to continue scanning the input to find further syntax errors. 
This leads to the problem of getting the parser restarted after an error. A 
general class of algorithms to do this involves discarding a number of 
tokens from the input string and attempting to adjust the parser so that 
input can continue. 

To allow the user some control over this process, yacc provides the 
token name error. This name can be used in grammar rules. In effect, it 
suggests places where errors are expected and recovery might take place. 
The parser pops its stack until it enters a state where the token error is 
legal. It then behaves as if the token error were the current look-ahead 
token and performs the action encountered. The look-ahead token is then 
reset to the token that caused the error. If no special error rules have been 
specified, the processing halts when an error is detected. 

In order to prevent a cascade of error messages, the parser, after detect- 
ing an error, remains in error state until three tokens have been successfully 
read and shifted. If an error is detected when the parser is already in error 
state, no message is given, and the input token is quietly deleted. 

As an example, a rule of the form 

Stat : error 

means that on a syntax error the parser attempts to skip over the statement 
in which the error is seen. More precisely, the parser scans ahead, looking 
for three tokens that might legally follow a statement, and start processing 
at the first of these. If the beginnings of statements are not sufficiently dis- 
tinctive, it may make a false start in the middle of a statement and end up 
reporting a second error where there is in fact no error. 

Actions may be used with these special error rules. These actions might 
attempt to reinitialize tables, reclaim symbol table space, etc. 
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Error rules such as the above are very general but difficult to control. 
Rules such as 

Stat : error 

are somewhat easier. Here, when there is an error, the parser attempts to 
skip over the statement but does so by skipping to the next semicolon. All 
tokens after the error and before the next semicolon cannot be shifted and 
are discarded. When the semicolon is seen, this rule will be reduced and 
any cleanup action associated with it performed. 

Another form of error rule arises in interactive applications where it 
may be desirable to permit a line to be reentered after an error. The follow- 
ing example 




is one way to do this. There is one potential difficulty with this approach. 
The parser must correctly process three input tokens before it admits that it 
has correctly resynchronized after the error. If the reentered line contains 
an error in the first two tokens, the parser deletes the offending tokens and 
gives no message. This is clearly unacceptable. For this reason, there is a 
mechanism that can force the parser to believe that error recovery has been 
accomplished. The statement 

yyerrok ; 

in an action resets the parser to its normal mode. The last example can be 
rewritten as 
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error '\n' 




JOrercok; 

(void) printf ( "Reenter last line: " ) ; 



ii^t 



$$ = $4; 



V 



J 



which is somewhat better. 

As previously mentioned, the token seen immediately after the error 
symbol is the input token at which the error was discovered. Sometimes, 
this is inappropriate; for example, an error recovery action might take upon 
itself the job of finding the correct place to resume input. In this case, the 
previous look-ahead token must be cleared. The statement 

yyclearin ; 

in an action will have this effect. For example, suppose the action after 
error were to call some sophisticated resynchronization routine (supplied by 
the user) that attempted to advance the input to the beginning of the next 
valid statement. After this routine is called, the next token returned by 
yylex is presumably the first token in a legal statement. The old illegal 
token must be discarded and the error state reset. A rule similar to 
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r 




Stat 



eixor 



{ 



resynch( ) ; 
yyerrok ; 
yyclearin; 



} 



could perform this. 

These mechanisms are admittedly crude but do allow for a simple, fairly 
effective recovery of the parser from many errors. Moreover, the user can 
get control to deal with the error actions required by other portions of the 
program. 
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When the user inputs a specification to yacc, the output is a file of C 
language subroutines, called y.tab.c. The function produced by yacc is 
called yyparseO; it is an integer valued function. When it is called, it in 
turn repeatedly calls yylex(), the lexical analyzer supplied by the user (see 
"Lexical Analysis"), to obtain input tokens. Eventually, an error is detected, 
yyparseO returns the value 1, and no error recovery is possible, or the lexi- 
cal analyzer returns the end-marker token and the parser accepts. In this 
case, yyparseO returns the value 0. 

The user must provide a certain amount of environment for this parser 
in order to obtain a working program. For example, as with every C 
language program, a routine called main() must be defined that eventually 
calls yparse(). In addition, a routine called yyerror() is needed to print a 
message when a syntax error is detected. 

These two routines must be supplied in one form or another by the 
user. To ease the initial effort of using yacc, a library has been provided 
with default versions of main() and yerror(). The library is accessed by a 
-ly argument to the cc(l) command or to the loader. The source codes 

min( ) 
{ 

return (yyparseO); 

} 

and 

# include <stx3io.h> 

yyerror{s) 

char ♦s; 

{ 

(void) fprintf(stderr, "%s\n", s); 

} 

show the triviality of these default programs. The argument to yerrorO is a 
string containing an error message, usually the string syntax error. The 
average application wants to do better than this. Ordinarily, the program 
should keep track of the input line number and print it along with the mes- 
sage when a syntax error is detected. The external integer variable yychar 
contains the look-ahead token number at the time the error was detected. 
This may be of some interest in giving better diagnostics. Since the main() 
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routine is probably supplied by the user (to read arguments, etc.), the yacc 
library is useful only in small projects or in the earliest stages of larger 
ones. 

The external integer variable yydebug is normally set to 0. If it is set to 
a nonzero value, the parser will output a verbose description of its actions 
including a discussion of the input symbols read and what the parser 
actions are. It is possible to set this variable by using sdb. 
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This part contains miscellaneous hints on preparing efficient, easy to 
change, and clear specifications. The individual subsections are more or less 
independent. 



Input Style 

It is difficult to provide rules with substantial actions and still have a 
readable specification file. The following are a few style hints. 

1 . Use all uppercase letters for token names and all lowercase letters 
for nonterminal names. This is useful in debugging. 

2. Put grammar rules and actions on separate lines. It makes editing 
easier. 

3. Put all rules with the same left-hand side together. Put the left- 
hand side in only once and let all following rules begin with a vert- 
ical bar. 

4. Put a semicolon only after the last rule with a given left-hand side 
and put the semicolon on a separate line. This allows new rules to 
be easily added. 

5. Indent rule bodies by one tab stop and action bodies by two tab 
stops. 

6. Put complicated actions into subroutines defined in separate files. 

Example 1 is written following this style, as are the examples in this sec- 
tion (where space permits). The user must decide about these stylistic ques- 
tions. The central problem, however, is to make the rules visible through 
the morass of action code. 



Left Recursion 

The algorithm used by the yacc parser encourages so called left recur- 
sive grammar rules. Rules of the form 

name : name rest of rule ; 

match this algorithm. These rules such as 
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list 



and 



seq 



item 

list ' , ' item 



item 

seq item 



frequently arise when writing specifications of sequences and lists. In each 
of these cases, the first rule will be reduced for the first item only; and the 
second rule will be reduced for the second and all succeeding items. 

With right recursive rules, such as 



seq 



item 

item seq 



the parser is a bit bigger; and the items are seen and reduced from right to 
left. More seriously, an internal stack in the parser is in danger of 
overflowing if a very long sequence is read. Thus, the user should use left 
recursion wherever reasonable. 

It is worth considering if a sequence with zero elements has any mean- 
ing, and if so, consider writing the sequence specification as 



seq 



/* enpty */ 
seq item 



using an empty rule. Once again, the first rule would always be reduced 
exactly once before the first item was read, and then the second rule would 
be reduced once for each item read. Permitting empty sequences often 
leads to increased generality. However, conflicts might arise if yacc is asked 
to decide which empty sequence it has seen when it hasn't seen enough to 
know! 
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Lexical Tie-Ins 

Some lexical decisions depend on context. For example, the lexical 
analyzer might want to delete blanks normally, but not within quoted 
strings, or names might be entered into a symbol table in declarations but 
not in expressions. One way of handling these situations is to create a glo- 
bal flag that is examined by the lexical analyzer and set by actions. For 
example. 




int dflag; 

%} 

. . . other declaratioois . . . 



prog : decls stats 



decls : /* enqprty */ 
{ 

dflag = 1; 

} 

I decls declaration 
» 

stats : /* enspty */ 
{ 

dflag = 0; 

} 

I stats statement 



. other nales . 




specifies a program that consists of zero or more declarations followed by 
zero or more statements. The flag dflag is now when reading statements 
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and 1 when reading declarations, except for the first token in the first state- 
ment. This token must be seen by the parser before it can tell that the 
declaration section has ended and the statements have begun. In many 
cases, this single token exception does not affect the lexical scan. 

This kind of back-door approach can be elaborated to a noxious degree. 
Nevertheless, it represents a way of doing some things that are difficult, if 
not impossible, to do otherwise. 



Reserved Words 

Some programming languages permit you to use words like if, which 
are normally reserved as label or variable names, provided that such use 
does not conflict with the legal use of these names in the programming 
language. This is extremely hard to do in the framework of yacc. It is 
difficult to pass information to the lexical analyzer telling it this instance of 
if is a keyword and that instance is a variable. The user can make a stab at 
it using the mechanism described in the last subsection, but it is difficult. 

A number of ways of making this easier are under advisement. Until 
then, it is better that the keywords be reserved, i.e., forbidden for use as 
variable names. There are powerful stylistic reasons for preferring this. 



yacc 6-37 



Advanced Topics 

This part discusses a number of advanced features of yacc. 



Simulating error and accept in Actions 

The parsing actions of error and accept can be simulated in an action by 
use of macros YYACCEPT and YYERROR. The YY ACCEPT macro causes 
yyparseO to return the value 0; YYERROR causes the parser to behave as if 
the current input symbol had been a syntax error; yyerror() is called^ and 
error recovery takes place. These mechanisms can be used to simulate 
parsers with multiple end-markers or context sensitive syntax checking. 



Accessing Values in Enclosing Rules 

An action may refer to values returned by actions to the left of the 
current rule. The mechanism is simply the same as with ordinary actions, a 
dollar sign followed by a digit. 
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sent 



adj 



adj noun verb adj noun 
look at the sentence . . . 



THE 



$$ = THE; 



$$ = TOUNG; 

DOG 

$$ s DOG; 

if ( $0 == YOUNG ) 
{ 

(void) printf( "v*iat?\n" ); 

} 

$$ = CRONE; 



In this case, the digit may be or negative. In the action following the 
word CRONE, a check is made that the preceding token shifted was not 
YOUNG. Obviously, this is only possible when a great deal is known about 
what might precede the symbol noun in the input. There is also a dis- 
tinctly unstructured flavor about this. Nevertheless, at times this mechan- 
ism prevents a great deal of trouble especially when a few combinations are 
to be excluded from an otherwise regular structure. 
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Support for Arbitrary Value Types 

By default, the values returned by actions and the lexical analyzer are 
integers, yacc can also support values of other types including structures. 
In addition, yacc keeps track of the types and inserts appropriate union 
member names so that the resulting parser is strictly type checked, yacc 
value stack is declared to be a union of the various types of values desired. 
The user declares the union and associates union member names with each 
token and nonterminal symbol having a value. When the value is refer- 
enced through a $$ or $n construction, yacc will automatically insert the 
appropriate union name so that no unwanted conversions take place. In 
addition, type checking commands such as lint are far more silent. 

There are three mechanisms used to provide for this typing. First, there 
is a way of defining the union. This must be done by the user since other 
subroutines, notably the lexical analyzer, must know about the union 
member names. Second, there is a way of associating a union member 
name with tokens and nonterminals. Finally, there is a mechanism for 
describing the type of those few values where yacc cannot easily determine 
the type. 

To declare the union, the user includes 
{ 

body of union . . . 

} 

in the declaration section. This declares the yacc value stack and the exter- 
nal variables yylval and yyval to have type equal to this union. If yacc was 
invoked with the — d option, the union declaration is copied onto the 
y.tabA file as YYSTYPE. 

Once YYSTYPE is defined, the union member names must be associated 
with the various terminal and nonterminal names. The construction 

<name> 

is used to indicate a union member name. If this follows one of the key- 
words %token, %lef t, %right, and %nonassoc, the union member name is 
associated with the tokens listed. Thus, saying 

%left <optype> 
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causes any reference to values returned by these two tokens to be tagged 
with the union member name optype. Another keyword, %type, is used to 
associate union member names with nonterminals. Thus, one might say 

%type <nodetype> expr stat 

to associate the union member nodetype with the nonterminal symbols 
expr and stat. 

There remain a couple of cases where these mechanisms are insufficient. 
If there is an action within a rule, the value returned by this action has no a 
priori type. Similarly, reference to left context values (such as $0) leaves 
yacc with no easy way of knowing the type. In this case, a type can be 
imposed on the reference by inserting a union member name between < 
and > immediately after the first $. The example 



rule : aaa 

{ 

$<iiitval>$ a 3; 

} 

{ 

fun( $<int'reil>2, $<other>0 ); 

} 
♦ 

shows this usage. This syntax has little to recommend it, but the situation 
arises rarely. 

A sample specification is given in Example 2. The facilities in this sub- 
section are not triggered until they are used. In particular, the use of %type 
will turn on these mechanisms. When they are used, there is a fairly strict 
level of checking. For example, use of $n or $$ to refer to something with 
no defined type is diagnosed. If these facilities are not triggered, the yacc 
value stack is used to hold ints. 
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yacc Input Syntax 

This section has a description of the yacc input syntax as a yacc 
specification. Context dependencies, etc. are not considered. Ironically, 
although yacc accepts an LALR(l) grammar, the yacc input specification 
language is most naturally specified as an LR(2) grammar; the sticky part 
comes when an identifier is seen in a rule immediately following an action. 
If this identifier is followed by a colon, it is the start of the next rule; other- 
wise, it is a continuation of the current rule, which just happens to have an 
action embedded in it. As implemented, the lexical analyzer looks ahead 
after seeing an identifier and decides whether the next token (skipping 
blanks, newlines, and comments, etc.) is a colon. If so, it returns the token 
CJDENTIFIER. Otherwise, it returns IDENTIFIER. Literals (quoted strings) 
are also returned as IDENTIFIERS but never as part of C JDENTIFIERs. 



/* graninar for the iirpit to yacc ♦/ 

A basic entries */ 

%tdtoen ZDENEIFIER /» includes identifiers and literals */ 

Xtoken C_IDENTIFIER A identifier (but not literal) followed by a : ♦/ 

%tdken NUMBER /* [0-9]+ */ 

/« reserved words: %type=>T3fPE %left=>IiEET,etc. */ 

Xtoken LEIT PKafT NCNASSOC TOKEN FRBC TYPE START UNICN 

%tdken MARK /* the 99£ mark */ 
%tGkfin WUBL /* the %{ mark ♦/ 
%tdken RCURL A the X} mark */ 

A ASCII character literals stand for themselves */ 

Xtoken spec 

spec : def s MARK rules tail 
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continued 



tail : MARK 
{ 



} 



In this action, eat \jp the rest of the file 
/* aiipLy: the second MARK is optional */ 



/* empty */ 
def s def 



def : START IDENTIFIBR 
I UNION 



Copy union definition to output 
DCORL 

Copy C code to output file 

RCORL 

rvord tag nlist 



rwoid : TGKEM 

! i£pr 

! RiQir 

I NGNASSOC 

I TYPE 



tag : A enpty: union tag is optional ♦/ 
I IDENTIFIER 



nlist : mnno 

I nlist nrnno 
I nlist ' / n 
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continued 



HMCTFIER /* Note: literal illegal with % type #/ 

XDENl'Ub'lER NUMBER /» Note; illegal with % type ♦/ 



/* rule section ♦/ 



rules 



rule 



C^IDENTIFIER rbody prec 
rules rule 



C_IDE2fnFI£R rbod/ prec 
' 1 ' rbody prec 



rbody ; /♦ enpty ♦/ 

I rbod/ IDENTIFEER 
I rbody act 



act 



Copy action translate $$ etc. 



prec : /♦ empty ♦/ 

I FBEC JDEHnFIER act 
I prec ; 
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1. A Simple Example 



This example gives the complete yacc applications for a small desk cal- 
culator; the calculator has 26 registers labeled a through z and accepts arith- 
metic expressions made up of the operators 

+ ,—,♦,/,% (nod operataDr), Su (bitwise and), I (bitwise or), 
and assignments. 

If an expression at the top level is an assignment, only the assignment is 
done; otherwise, the expression is printed. As in the C language, an integer 
that begins with (zero) is assumed to be octal; otherwise, it is assumed to 
be decimal. 

As an example of a yacc specification, the desk calculator does a reason- 
able job of showing how precedence and ambiguities are used and demon- 
strates simple recovery. The major oversimplifications are that the lexical 
analyzer is much simpler than for most applications, and the output is pro- 
duced immediately line by line. Note the way that decimal and octal 
integers are read in by grammar rules. This job is probably better done by 
the lexical analyzer. 



# include <8tdio.h> 

# iiKslude <ctype.h> 

int regs[26]; 
int base; 

%} 

%start list 

%tdkisn DIGIT LETTER 

9aeft 'I' 

9a.eft 

?aeft 

?aeft V 
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%lef t UMINUS /* supplies precedenoe far unaiy minus ♦/ 

5©6 /♦ beginning of irules section */ 

list : /♦ eapty */ 

I list Stat '\n' 
I list error '\n' 
{ 

yyerrok; 

> 



Stat : expr 

{ 

(void) pa:intf( "%aNn", $1 ); 

} 

I LEZPPER expo: 
{ 

regs[$1] = $3; 

} 



expr : expc 
{ 

$$ = $2; 

} 

I eaqpor expr 
{ 

$$ = $1 + $3; 

} 

I expr ejqjr 
{ 

$$ = $1 - $3; 

{ 

t expr ejqpir 
{ 

$$ = $1 * $3; 

} 

I eaqxr expr 
{ 

$$ = $1 / $3; 

} 

I exp expr 
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{ 

$$ = $1 % $3; 

) 

J expoc expr 
{ 

$$ = $1 & $3; 

} 

I expr ' I ' eoqpr 
{ 

$$ = $1 ; $3; 

} 

I expr 9^ec IMENDS 

{ 

$$ = -$2; 

} 

I LETITER 
{ 

$$ = reg[$1]; 

> 

I niscber 



nuDDber : DIGIT 
{ 



$$ = $1; base = ($1=0) 7 8 ; 10; 
insniber DIGIT 

$$ = base * $1 + $2; 



9^ /* beginning of subaxutines section «/ 

int yylex( ) /* lexical analysis rcutine ♦/ 
{ /♦ return LETPEER for lowercase letter, ♦/ 

/* yylval = through 25 ♦/ 

/♦ returns DIGIT for digit, yylval = throui^ 9 ♦/ 
/♦ all other characters are returned inraediately */ 
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int c; 

/♦skip blanks*/ 
yAdle ((c = getcharO) = ' ') 



/* c is new nosiblank ♦/ 

if (islower(c}) 
{ 

yylval = c — 'a' ; 
return (LETTER); 

> 

if {isdigit(c)) 
} 

yylval = c - '0' ; 
return (DIGrr); 



} 

return (c); 



2. An Advanced Example 

This section gives an example of a grammar using some of the advanced 
features. The desk calculator example in Example 1 is modified to provide a 
desk calculator that does floating point interval arithmetic. The calculator 
understands floating point constants; the arithmetic operations +,— *,// 
unary - a through z. Moreover, it also understands intervals written 

(X,y) 

where X is less than or equal to Y. There are 26 interval valued variables A 
through Z that may also be used. The usage is similar to that in Example 1; 
assignments return no value and print nothing while expressions print the 
(floating or interval) value. 
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This example explores a number of interesting features of yacc and C. 
Intervals are represented by a structure consisting of the left and right end- 
point values stored as doubles. This structure is given a type name, INTER- 
VAL, by using typedef . yacc value stack can also contain floating point 
scalars and integers (used to index into the arrays holding the variable 
values). Notice that the entire strategy depends strongly on being able to 
assign structures and unions in C language. In fact, many of the actions call 
functions that return structures as well. 

It is also worth noting the use of YYERROR to handle error 
conditions— division by an interval containing and an interval presented 
in the wrong order. The error recovery mechanism of yacc is used to throw 
away the rest of the offending line. 

In addition to the mixing of types on the value stack, this grammar also 
demonstrates an interesting use of syntax to keep track of the type (for 
example, scalar or interval) of intermediate expressions. Note that scalar 
can be automatically promoted to an interval if the context demands an 
interval value. This causes a large number of conflicts when the grammar is 
run through yacc: 18 shift-reduce and 26 reduce-reduce. The problem can 
be seen by looking at the two input lines. 

2,5 + (3,5 - 4.) 

and 

2.5 + (3.5, 4) 

Notice that the 2.5 is to be used in an interval value expression in the 
second example, but this fact is not known until the comma is read. By this 
time, 2.5 is finished, and the parser cannot go back and change its mind. 
More generally, it might be necessary to look ahead an arbitrary number of 
tokens to decide whether to convert a scalar to an interval. This problem is 
evaded by having two rules for each binary interval valued operator— one 
when the left operand is a scalar and one when the left operand is an inter- 
val. In the second case, the right operand must be an interval, so the 
conversion will be applied automatically. Despite this evasion, there are 
still many cases where the conversion may be applied or not, leading to the 
above conflicts. They are resolved by listing the rules that yield scalars first 
in the specification file; in this way, the conflict will be resolved in the 
direction of keeping scalar valued expressions scalar valued until they are 
forced to become intervals. 
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This way of handling multiple types is very instructive. If there were 
many kinds of expression types instead of just two, the number of rules 
needed would increase dramatically and the conflicts even more dramati- 
cally. Thus, while this example is instructive, it is better practice in a more 
normal programming language environment to keep the type information 
as part of the value and not as part of the grammar. 

Finally, a word about the lexical analysis. The only unusual feature is 
the treatment of floating point constants. The C language library routine 
atof is used to do^the actual conversion from a character string to a 
double-precision value. If the lexical analyzer detects an error, it responds 
by returning a token that is illegal in the grammar provoking a syntax error 
in the parser and thence error recovery. 



^include <stdio.h> 
#include <ctype.h> 

typedfif struct interval 
{ 

double lo, hi; 
} JS^TEBNAb; 

INIERVAL vinuK ) , vdiv( ) ; 

double atof ( ) ; 

double dreg[26]; 
INIEECUAL vreg[26]; 

%} 

%start line 

9£iznion 
{ 

int ival; 
double dval; 
INTERVAL wal; 



%tc3cen <ival> ERBG VRBG /♦ indices into dreg, vreg arrays */ 





%{ 
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96to3cen <dval> OQNST /♦ floating point constant ♦/ 

%t7pe <dval> dexp /♦ ej^ession ♦/ 

^^type <wal> vexp /« interval esqiression ♦/ 

/* precedence inf ornation abcxit the operators */ 



9a.eft 
5a.eft '/' 

%left TJMINUS /* precedence for unary minus */ 

9^ /« beginning of rules section */ 

lines ; /♦ enpty #/ 
! lines line 

line : dexp '\n' 
{ 

(void) printf("9615.8fNn",$1); 

} 

! vesq) 'Nn' 



(void) printf("(%15.8f, %15.8f)\n", $1.1o, $1.hi); 



ERBG dfi:qp '\n' 
dreg[$1] = $3; 



} 

{ VREG vejqp '\n' 
{ 

} 
I 
I 

{ 
} 



vreg[$1] = $3; 
error '\n' 

yyerrc3c; 

CONST 
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veaqp 



I TJBEG 
{ 



$$ = dreg[$1]; 
de^ dexp 

$$ = $1 + $3; 
dexp ' dsxp 

$$ s $1 - $3; 
6isxp 6exp 

$$ = $1 » $3; 
V dexp 

$$ = $1 / $3; 



} 

I 6exp %prec IMLNDS 



$$ = -$2; 
$$ = $2; 



dexp 



$$.hi = $$.lo = $1; 



dexp 



$$.lo = $2; 
$$,hi a $4; 
if ( $$.lo > SS.hi ) 
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VREG 



(void) printfC "interval exit of order \n"); 



$$ = vreg[$1]; 

vexp vexp 

$$.hi = $1,hi + $3.hi; 
$$.lo = $1.lo + $3.1o; 

desqp vexp 

$$.hi = $1 + $3.hi; 
$$.lo = $1 + $3.1o; 



$$.hi = $1.hi - $3.1o; 
$$.lo = $1.1o - $3.hi; 



dvep ' vdep 



$$.hi = $1 - $3,10; 
$$.lo = $1 - $3.hi 



$$ = vnailC $1.lo,$.hi,$3 ) 
dexp vexp 

$$ = vraul{ $1, $1. $3 ) 
vexp '/' vexp 
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if ( dchecfc( $3 ) ) YYERROR; 
$$ = vdiv( $iao, $1.hi, $3 ) 

V vexp 

if ( dchecilc( $3 ) ) YYERRCR; 
$$ = vdiv( $1.lo, $1.hi, $3 ) 



) 
I 
I 

{ 

$$,hi = -$2.1o;$$ao = -$2.hi 

} 

I vexp ')' 

} 

$$ = $2 

} 



9©6 /* begiming of subroutines sectican */ 

# define BSZ 50 /» buffer size for floating point number */ 
/♦ lexical analysis ♦/ 



int yylex( ) 
{ 



register int c; 

/♦ sikip over blanks */ 
vAiile ((c = getcharO) = ' ') 
» 

if (isi:qpper(c)) 
{ 

yylval.ival = c — 'A' 
return (VRBG); 

> 

if (islower(c)) 
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yylval.ivcil = c — 'a', 
retum( WEG ); 

} 



/* gobble up digits, points, exponents «/ 

if (isdigit(c) II c == '.') 
{ 

char baf[BSZ+1], *cp = buf; 
int dot = 0, exp = 0; 



for(; (cp - buf) < BSZ ; ++cp, c = getcharO) 
{ 

»qp = c; 
if (isdigit(c)) 

cxjntinue; 
if (c == '.') 

{ 

if (dot++ I I Gsqp) 
return ('.'); /* vri.ll cause syntax err o r ♦/ 
cooatinue; 

} 

if( c =*= 'e') 
{ 

if (e3q>f+) 

return Ce'); /« vdll cause syntax er ror */ 
continue; 

} 

/♦ end of ramiber */ 

break; 



*cp = ; 

if {cp - buf >= BSZ) 

(void) printf ("constant too long — truncated\n" ) ; 
else 

ungetc(c, stdin); /* push back last char read */ 
yylval.dval = atof(buf); 
return (CONST); 
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return (c); 



hilo(a, b, c, d) 

double a, b, 



c, d; 



/♦ returns the atallest interval oontaining a, b, c, and d ♦/ 

/» tised by *,/ routine */ 
INTEKVAL v; 

if (a > b) 
{ 

v.hi = a; 
v.lo = b; 



else 
{ 



v.hi = 
v.lo = 



b; 
a; 



} 

if {c 



> d) 



if 



(c > v.hi) 
v.hi = c; 
(d < v.lo) 
v.lo = d; 



if 



else 



if 



(d > v.hi) 
v.hi = d; 
(c < v.lo) 
v.lo = c; 



if 



} 

return (v); 
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vmoKa, b, v) 

double a, b; 
INTEKVAL v; 

{ 

return (hilo(a ♦ v.hi, a ♦ v,lo, b * v. hi, b * v.lo)); 

} 

dcheck(v) 

INTEHVAL v; 

{ 

if (v.hi >= 0. v.lo <= 0.) 
{ 

(void) printf ( "divisor Interval contains 0.\n"); 
return ( 1 ) ; 

} 

return (0); 

{ 

vdiv(a, b, v) 

double a, b; 
JNTHIVAL v; 

{ 



return (hilo(a / v.hi, a / v,lo, b / v.hi, b / v.lo)); 

} 
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FILE AND RECORD LOCKING I 



Introduction 



Mandatory and advisory file and record locking both are available on 
current releases of the UNIX system. The intent of this capability to is pro- 
vide a synchronization mechanism for programs accessing the same stores 
of data simultaneously. Such processing is characteristic of many multi-user 
applications, and the need for a standard method of dealing with the prob- 
lem has been recognized by standards advocates like /usr/ group, an organi- 
zation of UNIX system users from businesses and campuses across the coun- 
try. 

Advisory file and record locking can be used to coordinate self- 
synchronizing processes. In mandatory locking, the standard I/O subrou- 
tines and I/O system calls enforce the locking protocol. In this way, at the 
cost of a little efficiency, mandatory locking double checks the programs 
against accessing the data out of sequence. 

The remainder of this chapter describes how file and record locking 
capabilities can be used. Examples are given for the correct use of record 
locking. Misconceptions about the amount of protection that record locking 
affords are dispelled. Record locking should be viewed as a synchronization 
mechanism, not a security mechanism. 

The manual pages for the fcntl(2) system call, the lockf (3) library func- 
tion, and fcntl(5) data structures and commands are referred to throughout 
this section. You should read them before continuing. 
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Before discussing how record locking should be used, let us first define 
a few terms. 

Record 

A contiguous set of bytes in a file. The UNIX operating system does 
not impose any record structure on files. This may be done by the 
programs that use the files. 

Cooperating Processes 

Processes that work together in some well defined fashion to accom- 
plish the tasks at hand. Processes that share files must request per- 
mission to access the files before using them. File access permis- 
sions must be carefully set to restrict non-cooperating processes 
from accessing those files. The term process will be used inter- 
changeably with cooperating process to refer to a task obeying such 
protocols. 

Read (Share) Locks 

These are used to gain limited access to sections of files. When a 
read lock is in place on a record, other processes may also read lock 
that record, in whole or in part. No other process, however, may 
have or obtain a write lock on an overlapping section of the file. If 
a process holds a read lock it may assume that no other process will 
be writing or updating that record at the same time. This access 
method also permits many processes to read the given record. This 
might be necessary when searching a file, without the contention 
involved if a write or exclusive lock were to be used. 

Write (Exclusive) Locks 

These are used to gain complete control over sections of files. When 
a write lock is in place on a record, no other process may read or 
write lock that record, in whole or in part. If a process holds a 
write lock it may assume that no other process will be reading or 
writing that record at the same time. 

Advisory Locking 

A form of record locking that does not interact with the I/O subsys- 
tem (i.e. creat(2), open(2), read(2), and write(2)). The control over 
records is accomplished by requiring an appropriate record lock 
request before I/O operations. If appropriate requests are always 
made by all processes accessing the file, then the accessibility of the 
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file will be controlled by the interaction of these requests. Advisory 
locking depends on the individual processes to enforce the record 
locking protocol; it does not require an accessibility check at the 
time of each I/O request. 

Mandatory Locking 

A form of record locking that does interact with the I/O subsystem. 
Access to locked records is enforced by the creat(2), open(2), read(2), 
and write(2) system calls. If a record is locked, then access of that 
record by any other process is restricted according to the type of 
lock on the record. The control over records should still be per- 
formed explicitly by requesting an appropriate record lock before 
I/O operations, but an additional check is made by the system 
before each I/O operation to ensure the record locking protocol is 
being honored. Mandatory locking offers an extra synchronization 
check, but at the cost of some additional system overhead. 
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There are access permissions for UNIX system files to control who may 
read, write, or execute such a file. These access permissions may only be set 
by the owner of the file or by the superuser. The permissions of the direc- 
tory in which the file resides can also affect the ultimate disposition of a 
file. Note that if the directory permissions allow anyone to write in it, then 
files within the directory may be removed, even if those files do not have 
read, write or execute permission for that user. Any information that is 
worth protecting, is worth protecting properly. If your application warrants 
the use of record locking, make sure that the permissions on your files and 
directories are set properly. A record lock, even a mandatory record lock, 
will only protect the portions of the files that are locked. Other parts of 
these files might be corrupted if proper precautions are not taken. 

Only a known set of programs and /or administrators should be able to 
read or write a data base. This can be done easily by setting the set-group- 
ID bit (see chmod(l)) of the data base accessing programs. The files can 
then be accessed by a known set of programs that obey the record locking 
protocol. An example of such file protection, although record locking is not 
used, is the mail(l) command. In that command only the particular user 
and the mail command can read and write in the unread mail files. 



Opening a File for Record Locking 

The first requirement for locking a file or segment of a file is having a 
valid open file descriptor. If read locks are to be done, then the file must be 
opened with at least read accessibility and likewise for write locks and write 
accessibility. For our example we will open our file for both read and write 
access: 
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#include <ermo.h> 
#include <fca:itl.h> 



int fd; /* file descriptor ♦/ 
char ^filename; 

minCargc, argv) 
int argc; 
char ♦argv[]; 
{ 

extern void exit( ) , perror( ) ; 

/* get data base file name from ocmnand line and open the 
♦ file for read and vnrite access. 
»/ 

if (argc < 2) { 

(void) fprintf(stderr, "iisage: %s f ilenameNn" , argv[0]); 
exit{2); 
} 

filename = argv[1]; 

fd = cpen( filename, O PEWR) ; 
if (fd < 0) { 
perror( filename) ; 
exit(2); 
} 




The file is now open for us to perform both locking and I/O functions. 
We then proceed with the task of setting a lock. 



FILE AND RECORD LOCKING 7-5 



File Protection 



Setting a File Lock 

There are several ways for us to set a lock on a file. In part, these 
methods depend upon how the lock interacts with the rest of the program. 
There are also questions of performance as well as portability. Two 
methods will be given here, one using the f cntl(2) system call, the other 
using the /usr/ group standards compatible lockf(3) library function call. 

Locking an entire file is just a special case of record locking. For both 
these methods the concept and the effect of the lock are the same. The file 
is locked starting at a byte offset of zero (0) until the end of the maximum 
file size. This point extends beyond any real end of the file so that no lock 
can be placed on this file beyond this point. To do this the value of the size 
of the lock is set to zero. The code using the fcntl(2) system call is as fol- 
lows: 



7-6 PROGRAMMER'S GUIDE 



File Protection 





#include <fcai1:l.h> 
mBfixnB MfiXJIRYlO 
int try; 

struct flock Ick; 
try = 0; 

/♦ set MP the record locking structure, the address of vdiich 
* is passed to the fcntl system call. 
♦/ 

Ick.l type a F WRLCK; /♦ setting a write lock */ 

lck.l_vdiflnce = 0; /• offset l_start fron begiimir^ of file ♦/ 

Ick.lstart = OL; 

lck.l_len = OL; /♦ until the end of the file address space ♦/ 

/♦ Atten?jt locking MAX_TEQf times before giving tap, 
♦/ 

vtnle (fcntKfd, F_SEELK, &lck) < 0) { 

if (ermo EAGAIN I ! ermo » EACSCES) { 

/* there mi^t be other errors cases in which 
* yaa mi^it try again. 
*/ 

if (++try < mSiJKl) { 



(void) sleep(2); 
continue; 

} 

(void) fprintf (stderr/'File busy try again later I \n"); 
return; 



} 

perror( "fcntl"); 
exit(2); 



> 
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This portion of code tries to lock a file. This is attempted several times 
until one of the following things happens: 

■ the file is locked 

■ an error occurs 

■ it gives up trying because MAX_TRY has been exceeded 

To perform the same task using the lockf(3) function, the code is as fol- 
lows: 
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#include <unistd.h> 
MefiDB MAX_1Ry 10 
int try; 
try = 0; 

/♦ make sure the file pointer 
* is at the beginning of the file. 
♦/ 

lseek(fd, OL, 0); 

/♦ Attenipt locking MAX THY tiroes before giving xsp, 
*/ 

while (lockfCfd, FJELOCX, OL) < 0) { 



if (errno == EAGAIN 1 I ermo == EAOCES) { 

/* there tnic^t be other errors cases in *Aiich 
« you might try again. 



*/ 

if (+-»-try < MAX_aRy) { 
sleep(2) ; 
CGEtitimie; 

) 

(void) fprintf(stderr,"File busy try again laterINn"); 
return; 



It should be noted that the lockf(3) example appears to be simpler, but 
the f cntl(2) example exhibits additional flexibility. Using the £cntl(2) 
method, it is possible to set the type and start of the lock request simply by 
setting a few structure variables. lockf(3} merely sets write (exclusive) 
locks; an additional system call (lseek(2)) is required to specify the start of 
the lock. 



perrarC'lockf"); 
exit(2); 
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Setting and Removing Record Loclcs 

Locking a record is done the same way as locking a file except for the 
differing starting point and length of the lock. We will now try to solve an 
interesting and real problem. There are two records (these records may be 
in the same or different file) that must be updated simultaneously so that 
other processes get a consistent view of this information. (This type of 
problem comes up, for example, when updating the interrecord pointers in 
a doubly linked list.) To do this you must decide the following questions: 

■ What do you want to lock? 

■ For multiple locks, what order do you want to lock and unlock the 
records? 

■ What do you do if you succeed in getting all the required locks? 

■ What do you do if you fail to get all the locks? 

In managing record locks, you must plan a failure strategy if one cannot 
obtain all the required locks. It is because of contention for these records 
that we have decided to use record locking in the first place. Different pro- 
grams might: 

■ wait a certain amount of time, and try again 

■ abort the procedure and warn the user 

■ let the process sleep until signaled that the lock has been freed 

■ some combination of the above 

Let us now look at our example of inserting an entry into a doubly 
linked list. For the example, we will assume that the record after which the 
new record is to be inserted has a read lock on it already. The lock on this 
record must be changed or promoted to a write lock so that the record may 
be edited. 

Promoting a lock (generally from read lock to write lock) is permitted if 
no other process is holding a read lock in the same section of the file. If 
there are processes with pending write locks that are sleeping on the same 
section of the file, the lock promotion succeeds and the other (sleeping) 
locks wait. Promoting (or demoting) a write lock to a read lock carries no 
restrictions. In either case, the lock is merely reset with the new lock type. 
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Because the lusr/group lockf function does not have read locks, lock promo- 
tion is not applicable to that call. An example of record locking with lock 
promotion follows: 



struct reooocd { 



Icng prev; 
loEng nesct; 



/» data portion of record */ 

/* index to previous record in the list */ 
/♦ index to next record in the list ♦/ 



>; 



/* Lock p rono t ion using fcntl(2) 

* When this routine is entered it is assumed that there are read 

* locks en "here" and "next", 

* If write locks cn "here" and "next" are obtained; 

* Set a write lock on "this" . 

* Return index to "this" recard. 

* If any write lock is not obtained: 

* Restore read locks on "here" and "next". 

* Remove all other locks. 

* Return a -1, 
*/ 

long 

set31ock (this, here, next) 
long this, here, next; 
{ 



struct flock Ick; 

Ick.l type s F^WRLCK; 
Ick.l vdience = 0; 
lck.l_start - here; 
Ick.l len = sizeof (struct record); 



/* setting a write lock ♦/ 
/♦ offset l^start frcm beginning of file */ 



/* prom o te lock on "here" to write lock ♦/ 
if (fcntl(fd,. F_SErrLKW, &lck) < 0) { 
retmm (-1); 

} 

/♦ lock "this" with write lock ♦/ 
Ick.lstart == this; 
if (fcnU(fd, F^SEmw, &lck) < 0) { 
/♦ Lock on "this" failed; 

» deiaote lock on "here" to read lock. 

♦/ 

Ick.l type = F RDLCX; 
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lck,l_start = here; 

(voidT fcntl(fd, F_SEniCW, &lck); 

return (-1); 

} 

/« promote lock on "next" to write lock ♦/ 
lcdc.l_start = next; 
if (fcntl(fd, F^^SETLKW, SJLck) < 0) { 
/* Lock on "next" failed; 

* denote lock on "here" to read lock, 

«/ 

Ick.l^type = FVmX,; 

Ick.l^start = here; 
(void)"fcntl{fd, F^SETOC, SJck); 
/* and remove lode cn "this". 

*/ 

Idk.ltype = P_UNLCK; 

Ick.l^start = this; 

(void) fcntl(fd, F^SEIIK, &lck); 

return (-1);/* cannot set lock, try again or quit ♦/ 



The locks on these three records were all set to wait (sleep) if another 
process was blocking them from being set. This was done with the 
F_SETLKW command. If the F_SETLK command was used instead, the fcntl 
system calls would fail if blocked. The program would then have to be 
changed to handle the blocked condition in each of the error return sec- 
tions. 

Let us now look at a similar example using the lockf function. Since 
there are no read locks, all (write) locks will be referenced generically as 
locks. 



} 



return (this); 
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/« Lock procDticn using lodcf (3) 

* VJhen this routine is entered it is 

* no locdcs cn "here" aixi "next", 

* If lodes are obtaiiied: 

* Set a lock on "this". 

* Return index to "this" record. 

* If any lock is not obtained: 

* Remove all other locks. 

* Return a -1. 
#/ 



assumed that there are 



#iiK3lude <unistd.h> 
Icmg 

setaiock (this, here, neict) 
long this, here, next; 



{ 



/* lode "here" */ 

(void) lseek(fd, here, 0); 

if (lodcf (fd, F UXac, sizeof (struct record)) < 0) { 
return (-1); 

} 

A lode "this" ♦/ 

(void) lseek(fd, this, 0); 

if (lockf(fd, FJJOCK, sizeof (struct record)) < 0) { 
/* Lock on "this" failed. 
* Clear lock cn "here". 
♦/ 

(void) lseek(fd, here, 0); 

(void) lodcf (fd, F^UDOCK, sizeof (struct record)); 
return (-1); 



} 



/* lock "next" »/ 

(void) lseek(fd, next, 0); 

if (lodcf (fd, F_IJ0CK, sizeof (struct record)) < 0) { 



A Lock on "next" failed. 
* Clear lock on "here" , 
#/ 

(void) lse€k(fd, here, 0); 

(void) lodcf (fd, FJUIOCK, sizeof (struct record)); 
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/# and renove lock on "this", 
♦/ 

(void) lseek(fd, this, 0); 

(void) lockfCfd, F^ULOCK, sizeof (struct record) ) ; 
return cannot set lock, try again or quit */ 

} 



return (this); 

} 




Locks are removed in the same manner as they are set, only the lock 
type is different (F_UNLCK or F_ULOCK). An unlock cannot be blocked by 
another process and will only affect locks that were placed by this process. 
The unlock only affects the section of the file defined in the previous exam- 
ple by Ick. It is possible to unlock or change the type of lock on a subsec- 
tion of a previously set lock. This may cause an additional lock (two locks 
for one system call) to be used by the operating system. This occurs if the 
subsection is from the middle of the previously set lock. 



Getting Lock Information 

One can determine which processes, if any, are blocking a lock from 
being set. This can be used as a simple test or as a means to find locks on a 
file. A lock is set up as in the previous examples and the F_GETLK com- 
mand is used in the fcntl call. If the lock passed to fcntl would be blocked, 
the first blocking lock is returned to the process through the structure 
passed to fcntl. That is, the lock data passed to fcntl is overwritten by 
blocking lock information. This information includes two pieces of data 
that have not been discussed yet, l_pid and Ijsysid, that are only used by 
FjGETLK. (For systems that do not support a distributed architecture the 
value in l_sysid should be ignored.) These fields uniquely identify the pro- 
cess holding the lock. 
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If a lock passed to fcntl using the F GETLK command would not be 
blocked by another process' lock, then the l_type field is changed to 
F_UNLCK and the remaining fields in the structure are unaffected. Let us 
use this capability to print all the segments locked by other processes. Note 
that if there are several read locks over the same segment only one of these 
will be found. 



struct flock Idc; 

/♦ Find and print "write lock" blocked segments of this file. ♦/ 
(void) printf("sysid pid type start lengrthXn"); 
lck,l_whence = 0; 
Ick.l^start = OL; 
lck.l"len = OL; 
do { ' 

lck.l_type = FjmXX.; 

(void) fcntl (fd, F_GBnK, &lck) ; 

if (Ick.l type != F UNLCK) { 

(void) printf("%5d %5d Sto 3«8d 9«8d\n", 

Ick.l^sysid, 

lck.l__pid, 

{Ick.i type == F^WRLCK) 7 'W : 'R', 

Ick.lstart, 

Ick.l'len); 

/* if this lock goes to the end of the address 

♦ space, no need to look further, so break out. 
♦/ 

if (Ick.l len == 0) 
break; 

/♦ otherwise, look for new lock after the one 

♦ just found. 
♦/ 

Ick.lstart += lck.l_l€n; 

} 

} while (Idc.l type != F_UNDCK); 
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fcntl with the F_GETLK command will always return correctly (that is, 
it will not sleep or fail) if the values passed to it as arguments are valid. 

The lockf function with the F_TEST command can also be used to test if 
there is a process blocking a lock. This function does not, however, return 
the information about where the lock actually is and which process owns 
the lock. A routine using lockf to test for a lock on a file follows: 



/* find a blocked record. ♦/ 
/« se^ to begiiining of file ♦/ 
(void) lseek(fd, 0, OL); 

A set the size of the test region to zero (0) 
* to test until the end of the file address space. 
♦/ 

if (lockf(fd, POTEST, OL) < 0) { 
switch (ermo) { 

case EAOCES: 
case EAGAIN: 

(void) printfC'file is locked by another processNn" ) ; 

break; 

case EBADF: 

/* bad argument passed to lockf */ 

perroEr( "lockf"); 

break; 

default: 

(void) printf( "lockf: unknown errtsr <563>\n", ermo); 

break; 

} 

} 

V 



When a process forks, the child receives a copy of the file descriptors 
that the parent has opened. The parent and child also share a common file 
pointer for each file. If the parent were to seek to a point in the file, the 
child's file pointer would also be at that location. This feature has impor- 
tant implications when using record locking. The current value of the file 
pointer is used as the reference for the ofifset of the beginning of the lock, 
as described by l_start, when using a l_whence value of 1. If both the 
parent and child process set locks on the same file, there is a possibility that 
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a lock will be set using a file pointer that was reset by the other process. 
This problem appears in the lock£(3) function call as well and is a result of 
the lusrigroup requirements for record locking. If forking is used in a 
record locking program, the child process should close and reopen the file if 
either locking method is used. This will result in the creation of a new and 
separate file pointer that can be manipulated without this problem occur- 
ring. Another solution is to use the fcntl system call with a I whence value 
of or 2. This makes the locking function atomic, so that even processes 
sharing file pointers can be locked without difficulty. 



Deadlock Handling 

There is a certain level of deadlock detection /avoidance built into the 
record locking facility. This deadlock handling provides the same level of 
protection granted by the lusrigroup standard lockf call. This deadlock 
detection is only valid for processes that are locking files or records on a 
single system. Deadlocks can only potentially occur when the system is 
about to put a record locking system call to sleep. A search is made for con- 
straint loops of processes that would cause the system call to sleep 
indefinitely. If such a situation is found, the locking system call will fail 
and set errno to the deadlock error number. If a process wishes to avoid 
the use of the systems deadlock detection it should set its locks using 
F_GETLK instead of F_GETLKW. 
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The use of mandatory locking is not recommended for reasons that will 
be made clear in a subsequent section. Whether or not locks are enforced 
by the I/O system calls is determined at the time the calls are made and the 
state of the permissions on the file (see chmod(2)). For locks to be under 
mandatory enforcement, the file must be a regular file with the set-group-ID 
bit on and the group execute permission off. If either condition fails, all 
record locks are advisory. Mandatory enforcement can be assured by the 
following code: 



#iiK:lude <sys/types.li> 
#incl\:ide <sys/stat.h> 

ijit mode; 
stxuct Stat buf ; 



if (Stat (filename, SJxif ) < 0) { 
perror ( "program" ) ; 
exit (2); 

} 

/» get currently set mode */ 
mode = buf .st__mode; 

/* remove grcup execute permission from mode */ 

mode &= -(S_IEJaSJC»3); 

/* set 'set group id bit' in mode ♦/ 

mode != SISGID; 

if (chmod( filename, mode) < 0) { 

perror ( " p rogram" ) ; 

exit(2); 

} 
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Files that are to be record locked should never have any type of execute 
permission set on them. This is because the operating system does not obey 
the record locking protocol when executing a file. 

The chmod(l) command can also be easily used to set a file to have 
mandatory locking. This can be done with the command: 

chmod +1 filename 

The ls(l) command was also changed to show this setting when you ask for 
the long listing format: 

Is -1 filename 
causes the following to be printed: 

_rw — 1 1 abc other 1048576 Dec 3 11:44 filename 



Caveat Emptor — Mandatory Locking 

■ Mandatory locking only protects those portions of a file that are 
locked. Other portions of the file that are not locked may be accessed 
according to normal UNIX system file permissions, 

■ If multiple reads or writes are necessary for an atomic transaction, 
the process should explicitly lock all such pieces before any I/O 
begins. Thus advisory enforcement is sufficient for all programs that 
perform in this way. 

■ As stated earlier, arbitrary programs should not have unrestricted 
access permission to files that are important enough to record lock. 

■ Advisory locking is more efficient because a record lock check does 
not have to be performed for every I/O request. 
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Record Locking and Future Releases of the UNIX 
System 

Provisions have been made for file and record locking in a UNIX system 
environment. In such an environment the system on which the locking 
process resides may be remote from the system on which the file and record 
locks reside. In this way multiple processes on different systems may put 
locks upon a single file that resides on one of these or yet another system. 
The record locks for a file reside on the system that maintains the file. It is 
also important to note that deadlock detection /avoidance is only determined 
by the record locks being held by and for a single system. Therefore, it is 
necessary that a process only hold record locks on a single system at any 
given time for the deadlock mechanism to be effective. If a process needs to 
maintain locks over several systems, it is suggested that the process avoid 
the sleep-when-blocked features of fcntl or lockf and that the process 
maintain its own deadlock detection. If the process uses the sleep-when- 
blocked feature, then a timeout mechanism should be provided by the pro- 
cess so that it does not hang waiting for a lock to be cleared. 
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Introduction 



Efficient use of disk storage space, memory, and computing power is 
becoming increasingly important. A shared library can offer savings in all 
three areas. For example, if constructed properly, a shared library can make 
a.out files (executable object files) smaller on disk storage and processes 
(a.out files that are executing) smaller in memory. 

The first part of this chapter, "Using a Shared Library," is designed to 
help you use UNIX System V shared libraries. It describes what a shared 
library is and how to use one to build a.out files. It also offers advice about 
when and when not to use a shared library and how to determine whether 
an a.out uses a shared library. 

The second part in this chapter, "Building a Shared Library," describes 
how to build a shared library. You do not need to read this part to use 
shared libraries. It addresses library developers, advanced programmers 
who are expected to build their own shared libraries. Specifically, this part 
describes how to use the UNIX system tool mkshlib(l) (documented in the 
Programmer's Reference Manual) and how to write C code for shared libraries 
on a UNIX system. An example is included. This part also describes how 
to use the tool chkshlib(l), which helps you check the compatibility of ver- 
sions of shared libraries. Read this part of the chapter only if you have to 
build a shared library. 



NOTE 



Shared libraries are a new feature of UNIX System V Release 3.0. An 
executable object file that needs shared libraries will not run on previ- 
ous releases of UNIX System V. 
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If you are accustomed to using libraries to build your applications pro- 
grams, shared libraries should blend into your work easily. This part of the 
chapter explains what shared libraries are and how and when to use them 
on the UNIX system. 



What is a Shared Library? 

A shared library is a file containing object code that several a.out files 
may use simultaneously while executing. When a program is compiled or 
link edited with a shared library, the library code that defines the program's 
external references is not copied into the program's object file. Instead, a 
special section called .lib that identifies the library code is created in the 
object file. When the UNIX system executes the resulting a.out file, it uses 
the information in this section to bring the required shared library code 
into the address space of the process. 

The implementation behind these concepts is a shared library with two 
pieces. The first, called the host shared library, is an archive that the link 
editor searches to resolve user references and to create the .lib section in 
a.out files. The structure and operation of this archive is the same as any 
archive without shared library members. For simplicity, however, in this 
chapter references to archives mean archive libraries without shared library 
members. 

The second part of a shared library is the target shared library. This is 
the file that the UNIX system uses when running a.out files built with the 
host shared library. It contains the actual code for the routines in the 
library. Naturally, it must be present on the the system where the a.out 
files will be run. 

A shared library offers several benefits by not copying code into a.out 
files. It can 

■ save disk storage space 

Because shared library code is not copied into all the a.out files that 
use the code, these files are smaller and use less disk space. 

■ save memory 

By sharing library code at run time, the dynamic memory needs of 
processes are reduced. 
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■ make executable files using library code easier to maintain 

As mentioned above, shared library code is brought into a process' 
address space at run time. Updating a shared library effectively 
updates all executable files that use the library, because the operating 
system brings the updated version into new processes. If an error in 
shared library code is fixed, all processes automatically use the 
corrected code. 

Archive libraries cannot, of course, offer this benefit: changes to 
archive libraries do not affect executable files, because code from the 
libraries is copied to the files during link editing, not during execu- 
tion. 

"Deciding Whether to Use a Shared Library" in this chapter describes shared 
libraries in more detail. 



The UNIX System Shared Libraries 

For the 3B2 Computer, AT&T provides the C target shared library with 
UNIX System V Release 3.0 and later releases and the C host shared library 
with C Programming Language Utilities 4,0 and later. The networking 
library included with the Networking Support Utilities is also a shared 
library. Other shared libraries may be available now from software vendors 
and in the future from AT&T. 



Shared 


Host Library 


Target Library 


Library 


Command Line Option 


Pathname 


C Library 


-lc_s 


/shlib/libcjs 


Networking Library 


— Insljs 


/shlib/libnsljs 



Notice the _s suffix on the library names; we use it to identify both host 
and target shared libraries. For example, it distinguishes the standard relo- 
catable C library libc from the shared C library libc_s. The _s also indicates 
that the libraries are statically linked. 



SHARED LIBRARIES 8-3 



Using a Shared Library 



The relocatable C library is still available with releases of the C Pro- 
gramming Language Utilities; this library is searched by default during the 
compilation or link editing of C programs. All other archive libraries from 
previous releases of the system are also available. Just as you use the 
archive libraries' names, you must use a shared library's name when you 
want to use it to build your a.out files. You tell the link editor its name 
with the —I option, as shown below. 



Building an a.out File 

You direct the link editor to search a shared library the same way you 
direct a search of an archive library on the UNIX system: 

cc filex — o file . . . —llibrary Jile 

To direct a search of the networking library, for example, you use the 
following command line. 

cc file.c —o file . . . — lnsl_s . . . 

And to link all the files in your current directory together with the 
shared C library you'd use the following command line: 

cc *.c — lc_s 

Normally, you should include the — lc__s argument after all other —1 
arguments on a command line. The shared C library will then be treated 
like the relocatable C library, which is searched by default after all other 
libraries specified on a command line are searched. 

A shared library might be built with references to other shared libraries. 
That is, the first shared library might contain references to symbols that are 
resolved in a second shared library. In this case, both libraries must be 
given on the cc command line, in the order of the dependencies. 

For example, if the library libX.a references symbols in the shared C 
library, the command line would be: 

cc *,c — lX_s — lc_s 
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Notice that the shared library containing the references to symbols must 
be listed on the command line before the shared library needed to resolve 
those references. For more information on inter-library dependencies, see 
the section "Referencing Symbols in a Shared Library from Another Shared 
Library" later in this chapter. 



Coding an Application 

Application source code in C or assembly language is compatible with 
both archive libraries and shared libraries. As a result, you should not have 
to change the code in any applications you already have when you use a 
shared library with them. When coding a new application for use with a 
shared library, you should just observe your standard coding conventions. 

However, do keep the following two points in mind, which apply when 
using either an archive or a shared library: 

■ Don't define symbols in your application with the same names as 
those in a library. 

Although there are exceptions, you should avoid redefining standard 
library routines, such as printf(3S) and strcmp(3C). Replacements 
that are incompatibly defined can cause any library, shared or 
unshared, to behave incorrectly. 

■ Don't use undocumented archive routines. 

Use only the functions and data mentioned on the manual pages 
describing the routines in Section 3 of the Programmer's Reference 
Manual. For example, don't try to outsmart the ctype design by 
manipulating the underlying implementation. 



Deciding Whether to Use a Shared Library 

You should base your decision to use a shared library on whether it 
saves space in disk storage and memory for your program. A well-designed 
shared library almost always saves space. So, as a general rule, use a shared 
library when it is available. 
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To determine what savings are gained from using a shared library, you 
might build the same application with both an archive and a shared library, 
assuming both kinds are available. Remember, that you may do this 
because source code is compatible between shared libraries and archive 
libraries. (See the above section "Coding an Application.") Then compare 
the two versions of the application for size and performance. For example. 



$ cat hello. c 

nain( ) 

{ 

printfC'HelloNn"); 

} 

$ oc -o unshared hello. c 

$ oc -o shared hello. c -Ic s 

$ size unshared shared 

unshared: 8680 + 1388 + 2248 = 12316 

shared: 300 + 680 + 2248 = 3228 



If the application calls only a few library members, it is possible that 
using a shared library could take more disk storage or memory. The follow- 
ing section gives a more detailed discussion about when a shared library 
does and does not save space. 

When making your decision about using shared libraries, also remember 
that they are not available on UNIX System V releases prior to Release 3,0. 
If your program must run on previous releases, you will need to use archive 
libraries. 



More About Saving Space 

This section is designed to help you better understand why your pro- 
grams will usually benefit from using a shared library. It explains 

■ how shared libraries save space that archive libraries cannot 
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■ how shared libraries are implemented on the UNIX system 

■ how shared libraries might increase space usage 

How Shared Libraries Save Space 

To better understand how a shared library saves space, we need to com- 
pare it to an archive library. 

A host shared library resembles an archive library in three ways. First, 
as noted earlier, both are archive files. Second, the object code in the 
library typically defines commonly used text symbols and data symbols. 
The symbols defined inside and made visible outside the library are external 
symbols. Note that the library may also have imported symbols, symbols 
that it uses but usually does not define. Third, the link editor searches the 
library for these symbols when linking a program to resolve its external 
references. By resolving the references, the link editor produces an execut- 
able version of the program, the a.out file. 



NOTE 



Note that the link editor on the UNIX system is a static linking tool; 
static linking requires that all symbolic references in a program be 
resolved before the program may be executed. The link editor uses 
static linking with both an archive library and a shared library. 



Although these similarities exist, a shared library differs significantly 
from an archive library. The major differences relate to how the libraries 
are handled to resolve symbolic references, a topic already discussed briefly. 

Consider how the UNIX system handles both types of libraries during 
link editing. To produce an a.out file using an archive library, the link edi- 
tor copies the library code that defines a program's unresolved external 
reference from the library into appropriate .text and .data sections in the 
program's object file. In contrast, to produce an a.out file using a shared 
library, the link editor copies from the shared library into the program's 
object file only a small amount of code for initialization of imported sym- 
bols. (See the section "Importing Symbols" later in the chapter for more 
details on imported symbols.) For the bulk of the library code, it creates a 
special section called .lib in the file that identifies the library code needed 
at run time and resolves the external references to shared library symbols 
with their correct values. When the UNIX system executes the resulting 
a.out file, it uses the information in the .lib section to bring the required 
shared library code into the address space of the process. 
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Figure 8-1 depicts the a.out files produced using a regular archive ver- 
sion and a shared version of the standard C library to compile the following 
program: 



naiiiC ) 
{ 



printf ( "How do ycu like tMs nenual?\n" ) ; 
result = stratp( "I do.", aiiswer ); 



Notice that the shared version is smaller. Figure 8-2 depicts the process 
images in memory of these two files when they are executed. 
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a.out Using 
Archive Library 



FILE HEADER 



program .text 



library .text 
for printf(3S) and 
strcmp(3C) 



program .data 



library .data 
for printf (3S) and 
strcmp(3C) 



SYMBOL TABLE 



STRING TABLE 



Created by the link editor. 
Refers to library code for 
print and strcmp(3C) 




Copied to file by 
the link editor 



a.out Using 
Shared Library 



FILE HEADER 



program .text 



program .data 



.lib 



SYMBOL TABLE 



STRING TABLE 



Figure 8-1: a.out Files Created Using an Archive Library and a Shared 
Library 



Now consider what happens when several a.out files need the same 
code from a library. When using an archive library, each file gets its own 
copy of the code. This results in duplication of the same code on the disk 
and in memory when the a.out files are run as processes. In contrast, when 
a shared library is used, the library code remains separate from the code in 
the a.out files, as indicated in Figure 8-2. This separation enables all 
processes using the same shared library to reference a single copy of the 
code. 
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Address 
Space 





Archive 




Shared 




Version 




Version 











May be brought 
to other processes 
simultaneously 



Library 



Brought into process' 
address space 



Library code referred 
to by Jib 



Figure 8-2: Processes Using an Archive and a Shared Library 



How Shared Libraries Are Implemented 

Now that you have a better understanding of how shared libraries save 
space, you need to consider their implementation on the UNIX system to 
understand how they might increase space usage (this happens seldomly). 
The following paragraphs describe host and target shared libraries, the 
branch table, and then how shared libraries might increase space usage. 

The Host Library and Target Library 

As previously mentioned, every shared library has two parts: the host 
library used for linking that resides on the host machine and the target 
library used for execution that resides on the target machine. The host 
machine is the machine on which you build an a.out file; the target 
machine is the machine on which you run the file. Of course, the host and 
target may be the same machine, but they don't have to be. 

The host library is just like an archive library. Each of its members (typ- 
ically a complete object file) defines some text and data symbols in its sym- 
bol table. The link editor searches this file when a shared library is used 
during the compilation or link editing of a program. 
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The search is for definitions of symbols referenced in the program but 
not defined there. However, as mentioned earlier, the link editor does not 
copy the library code defining the symbols into the program's object file. 
Instead, it uses the library members to locate the definitions and then places 
symbols in the file that tell where the library code is. The result is the spe- 
cial section in the a.out file mentioned earlier (see the section "What is a 
Shared Library?") and shown in Figure 8-1 as .lib. 

The target library used for execution resembles an a.out file. The UNIX 
operating system reads this file during execution if a process needs a shared 
library. The special .lib section in the a.out file tells which shared libraries 
are needed. When the UNIX system executes the a.out file, it uses this sec- 
tion to bring the appropriate library code into the address space of the pro- 
cess. In this way, before the process starts to run, all required library code 
has been made available. 

Shared libraries enable the sharing of .text sections in the target library, 
which is where text symbols are defined. Although processes that use the 
shared library have their own virtual address spaces, they share a single 
physical copy of the library's text among them. That is, the UNIX system 
uses the same physical code for each process that attaches a shared library's 
text. 

The target library cannot share its .data sections. Each process using 
data from the library has its own private data region (contiguous area of vir- 
tual address space that mirrors the .data section of the target library). 
Processes that share text do not share data and stack area so that they do not 
interfere with one another. 

As suggested above, the target library is a lot like an a.out file, which 
can also share its text, but not its data. Also, a process must have execute 
permission for a target library to execute an a.out file that uses the library. 

The Branch Table 

When the link editor resolves an external reference in a program, it gets 
the address of the referenced symbol from the host library. This is because 
a static linking loader like Id binds symbols to addresses during link edit- 
ing. In this way, the a.out file for the program has an address for each 
referenced symbol. 
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What happens if library code is updated and the address of a symbol 
changes? Nothing happens to an a.out file built with an archive library, 
because that file already has a copy of the code defining the symbol. (Even 
though it isn't the updated copy, the a.out file will still run.) However, the 
change can adversely affect an a.out file built with a shared library. This 
file has only a symbol telling where the required library code is. If the 
library code were updated, the location of that code might change. There- 
fore, if the a.out file ran after the change took place, the operating system 
could bring in the wrong code. To keep the a.out file current, you might 
have to recompile a program that uses a shared library after each library 
update. 

To prevent the need to recompile, a shared library is implemented with 
a branch table on the UNIX system, A branch table associates text symbols 
with absolute addresses that do not change even when library code is 
changed. Each address labels a jump instruction to the address of the code 
that defines a symbol. Instead of being directly associated with the 
addresses of code, text symbols have addresses in the branch table. 

Figure 8-3 shows two a.out files executing that make a call to printf(3S). 
The process on the left was built using an archive library. It already has a 
copy of the library code defining the printf(3S) symbol. The process on the 
right was built using a shared library. This file references an absolute 
address (10) in the branch table of the shared library at run time; at this 
address, a jump instruction references the needed code. 
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Shared 

A shared library uses Library 
a branch table. 




Figure 8-3: A Branch Table in a Shared Library 



Data symbols do not have a mechanism to prevent a change of address 
between shared libraries. The tool chkshlib(l) compares a.out files with a 
shared library to check compatibility and help you decide if the files need 
to be recompiled. See "Checking Versions of Shared Libraries Using 
chkshlib(l)." 

How Shared Libraries IMight Increase Space Usage 

A target library might add space to a process. Recall from "How Shared 
Libraries are Implemented" in this chapter that a shared library's target file 
may have both text and data regions connected to a process. While the text 
region is shared by all processes that use the library, the data region is not. 
Every process that uses the library gets its own private copy of the entire 
library data region. Naturally, this region adds to the process's memory 
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requirements. As a result, if an application uses only a small part of a 
shared library's text and data, executing the application might require more 
memory with a shared library than without one. For example, it would be 
unwise to use the shared C library to access only strcmp(3C). Although 
sharing strcmp(3C) saves disk storage and memory, the memory cost for 
sharing all the shared C library's private data region outweighs the savings. 
The archive version of the library would be more appropriate. 

A host library might add space to an a.out file. Recall that UNIX Sys- 
tem V Release 3.0 uses static linking, which requires that all external refer- 
ences in a program be resolved before it is executed. Also recall that a 
shared library may have imported symbols, which are used but not defined 
by the library. To resolve these references, the link editor has to add to the 
a.out initialization code defining the referenced imported symbols file. This 
code increases the size of the a.out file. 



Identifying a.out Files tliat Use Sliared Libraries 

Suppose you have an executable file and you want to know whether it 
uses a shared library. You can use the dump(l) command (documented in 
the Programmer's Reference Manual) to look at the section headers for the file: 

dump — hv a.out 

If the file has a Jib section, a shared library is needed. If the a.out does 
not have a Jib section, it does not use shared libraries. 

With a little more work, you can even tell what libraries a file uses by 
looking at the .lib section contents. 

dump — L a.out 



Debugging a.out Files that Use Shared Libraries 

sdb reads the shared libraries' symbol tables and performs as docu- 
mented (in the Programmer's Reference Manual) using the available debug- 
ging information. The branch table is hidden so that functions in shared 
libraries can be referenced by their names, and the M command lists, 
among other information, the names of shared libraries' target files used by 
the executable file. 
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Shared library data are not dumped to core files, however. So, if you 
encounter an error that results in a core dump and does not appear to be in 
your application code, you may find debugging easier if you rebuild the 
application with the archive version of the library used. 
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This part of the chapter explains how to build a shared library. It cov- 
ers the major steps in the building process, the use of the UNIX system tool 
mkshlib(l) that builds the host and target libraries, and some guidelines for 
writing shared library code. There is an example at the end of this part 
which demonstrates the major features of mkshlib and the steps in the 
building process. 

This part assumes that you are an advanced C programmer faced with 
the task of building a shared library. It also assumes you are familiar with 
the archive library building process. You do not need to read this part of 
the chapter if you only plan to use the UNIX system shared libraries or 
other shared libraries that have already been built. 



The Building Process 

To build a shared library on the UNIX system, you have to complete six 
major tasks: 

■ Choose region addresses. 

■ Choose the pathname for the shared library target file. 

■ Select the library contents. 

■ Rewrite existing library code to be included in the shared library. 

■ Write the library specification file. 

■ Use the mkshlib tool to build the host and target libraries. 
Here each of these tasks is discussed. 

Step 1: Choosing Region Addresses 

The first thing you need to do is choose region addresses for your 
shared library. 

Shared library regions on the AT&T 3B2 Computer correspond to 
memory management unit (MMU) segment size, each of which is 128 KB. 
The following table gives a list of the segment assignments on the 3B2 
Computer (as of the copyright date for this guide) and shows what virtual 
addresses are available for libraries you might build. 
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Start 

A y*l 

/\QQr6S5 


Contents 


Target 
Pathname 


0x80000000 
0x803E0000 


Reserved for AT&T 

UNIX Shared C Library 
AT&T Networking Library 


/shlib/libc_s 
/shlib/libnsl_$ 


0x80400000 
0x80420000 


Generic Database Library 


Unassigned 


0x80440000 
0x80460000 


Generic Statistical Library 


Unassigned 


0x80480000 
0x804A0000 


Generic User Interface Library 


Unassigned 


0x804C0000 
0x804E0000 


Generic Screen Handling Library 


Unassigned 


0x80500000 
0x80520000 


Generic Graphics Library 


Unassigned 


0x80540000 
0x80560000 


Generic Networking Library 


Unassigned 


0x80580000 
0x80660000 


Generic - to be defined 


Unassigned 


0x80680000 
0x807E0000 


For private use 


Unassigned 



What does this table tell you? First, the AT&T shared C library and the 
networking library reside at the pathnames given above and use addresses 
in the range reserved for AT&T. If you build a shared library that uses 
reserved addresses you run the risk of conflicting with future AT&T pro- 
ducts. 

Second, a number of segments are allocated for shared libraries that pro- 
vide various services such as graphics, database access, and so on. These 
categories are intended to reduce the chance of address conflicts among 
commercially available libraries. Although two libraries of the same type 
may conflict, that doesn't matter. A single process should not usually need 
to use two shared libraries of the same type. If the need arises, a program 
can use one shared library and one archive library. 
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Any number of libraries can use the same virtual addresses, even on the 
NOTE same machine. Conflicts occur only within a single process, not among 
I separate processes. Thus two shared libraries can have the same region 
I addresses without causing problems, as long as a single a.out file doesn't 
need to use both libraries. 



Third, several segments are reserved for private use. If you are building 
a large system v^ith many a.out files and processes, shared libraries might 
improve its performance. As long as you don't intend to release the shared 
libraries as separate products, you should use the private region addresses. 
You can put your shared libraries into these segments and avoid conflicting 
v^ith commercial shared libraries. You should also use the private areas 
when you w^ill own all the a.out files that access your shared library. Don't 
risk address conflicts. 



If you plan to build a commercial shared library, you are strongly 
NOTE encouraged to provide a compatible, relocatable archive as well. Some of 
I your customers might not find the shared library appropriate for their 
I applications. Others might want their applications to run on versions of 
the UNIX system without shared library support. 



Step 2: Choosing the Target Library Pathname 

After you choose the region addresses for your shared library, you 
should choose the pathname for the target library. We chose /shlib/libc_s 
for the shared C library and /shlib/libnsl_s for the networking library. (As 
mentioned earlier, we use the _s suffix in the pathnames of all statically 
linked shared libraries.) To choose a pathname for your shared library, con- 
sult the established list of names for your computer or see your system 
administrator. Also keep in mind that shared libraries needed to boot a 
UNIX system should normally be located in /shlib; other application 
libraries normally reside in /usr/lib or in private application directories. 
Of course, if your shared library is for personal use, you can choose any 
convenient pathname for the target library. 
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Step 3: Selecting Library Contents 

Selecting the contents for your shared library is the most important task 
in the building process. Some routines are prime candidates for sharing; 
others are not. For example, it's a good idea to include large, frequently 
used routines in a shared library but to exclude smaller routines that aren't 
used as much. What you include will depend on the individual needs of 
the programmers and other users for whom you are building the library. 
There are some general guidelines you should follow, however. They are 
discussed in the section "Choosing Library Members" in this chapter. Also 
see the guidelines in the following sections: "Importing Symbols/ 
"Referencing Symbols in a Shared Library from Another Shared Library," 
and "Tuning the Shared Library Code." 

Step 4: Rewriting Existing Library Code 

If you choose to include some existing code from an archive library in a 
shared library, changing some of the code will make the shared code easier 
to maintain. See the section "Changing Existing Code for the Shared 
Library" in this chapter. 

Step 5: Writing the Library Specification File 

After you select and edit all the code for your shared library, you have 
to build the shared libra:ry specification file. The library specification file 
contains all the information that mkshlib needs to build both the host and 
target libraries. An example specification file is given in the section 
towards the end of the chapter, "An Example." Also, see the section "Using 
the Specification File for Compatibility" in this chapter. The contents and 
format of the specification file are given by the following directives (see 
also the mkshlib(l) manual page). 

All directives that are followed by multi-line specifications are valid 
until the next directive or the end of file. 

#address sectname address 

Specifies the start address, address, of section sectname for 
the target file. This directive is typically used to specify 
the start addresses of the .text and .data sections. 

#target pathname 

Specifies the pathname, pathname, of the target shared 
libraiy on the target machine. This is the location 
where the operating system looks for the shared library 
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during execution. Normally, pathname will be an abso- 
lute pathname, but it does not have to be. 

This directive must be specified exactly once per 
specification file. 

#branch Starts the branch table specifications. The lines follow- 

ing this directive are taken to be branch table 
specification lines. 

Branch table specification lines have the following for- 
mat: 

funcname < white space > position 

funcname is the name of the symbol given a branch table 
entry and position specifies the position of funcname's 
branch table entry, position may be a single integer or a 
range of integers of the form positionl-positionl. Each 
position must be greater than or equal to one. The same 
position cannot be specified more than once, and every 
position from one to the highest given position must be 
accounted for. 

If a symbol is given more than one branch table entry 
by associating a range of positions with the symbol or 
by specifying the same symbol on more than one branch 
table specification line, then the symbol is defined to 
have the address of the highest associated branch table 
entry. All other branch table entries for the symbol can 
be thought of as empty slots and can be replaced by new 
entries in future versions of the shared library. 

Finally, only functions should be given branch table 
entries, and those functions must be external. 

This directive must be specified exactly once per shared 
library specification file. 

#objects Specifies the names of the object files constituting the 

target shared library. The lines following this directive 
are taken to be the list of input object files in the order 
they are to be loaded into the target. The list simply 
consists of each filename followed by a newline charac- 
ter. This list of objects will be used to build the shared 
library. 
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This directive must be specified exactly once per shared 
library specification file. 

#objects noload 

Specifies the ordered list of host shared libraries to be 
searched to resolve references to symbols not defined in 
the library being built and not imported. Resolution of 
a reference in this way requires a version of the symbol 
with an absolute address to be found in one of the listed 
libraries. It's considered an error if a non-shared version 
of a symbol is found during the search for a shared ver- 
sion of the symbol. 

Each name specified is assumed to be a pathname to a 
host or an argument of the form -IX, where libX.a is 
the name of a file in the default library locations. This 
behavior is identical to that of Id, and the -L option can 
be used on the command line to specify other directories 
in which to locate these archives. 

#init object Specifies that the object file, object, requires initialization 
code. The lines following this directive are taken to be 
initialization specification lines. 

Initialization specification lines have the following for- 
mat: 

ptr < white space > import 

ptr is a pointer to the associated imported symbol, import, 
and must be defined in the current specified object file, 
object. The initialization code generated for each such 
line is of the form: 

ptr = ^import; 

All initializations for a particular object file must be 
given once and multiple specifications of the same object 
file are not allowed. 

#hide linker [*] 

This directive changes symbols that are normally external 
into static symbols, local to the library being created. A 
regular expression may be given [sh(l), egrep(l)], in 
which case all external symbols matching the regular 
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expression are hidden; the #export directive can be used 
to counter this effect for specified symbols. 

The optional is equivalent to the directive 

iftiide linker 

and causes all external symbols to be made into static 
symbols. 

All symbols specified in #init and #branch directives 
are assumed to be external symbols, and cannot be 
changed into static symbols using the #hide directive. 

#export linker * 

Specifies those symbols that a regular expression in a 
#hide directive would normally cause to be hidden but 
that should nevertheless remain external. For example, 

#hide liiiker * 
#expc3rt linker 

0!ne 

two 

causes all symbols except one, two, and those used in 
#branch and #init entries to be tagged as static. 

#ident string Specifies a string, string, to be included in the .comment 
section of the target shared library and the .comment 
sections of every member of the host shared library. 

## Specifies a comment. The rest of the line is ignored. 

Step 6: Using mkshlib to Build the Host and Target 

The UNIX system command mkshlib(l) builds both the host and target 
libraries, mkshlib invokes other tools such as the assembler, as(l), and link 
editor, ld(l). Tools are invoked through the use of execvp (see exec(2)), 
which searches directories in a user's $PATH environment variable. Also, 
prefixes to mkshlib are parsed in much the same manner as prefixes to the 
cc(l) command and invoked tools are given the prefix, where appropriate. 
For example, 3bmkshlib invokes 3bld. These commands all are docu- 
mented in the Programmer's Reference Manual, 
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The user input to mkshlib consists of the library specification file and 
command line options. We just discussed the specification file; let's take a 
look at the options now. The shared library build tool has the following 
syntax: 

mkshlib -s specfil -t target [-h host] [-L dir...] [-'n]['-q] 

-s specfil Specifies the shared library specification file, specfil This file 
contains all the information necessary to build a shared 
library. 

-t target Specifies the name, target, of the target shared library pro- 
duced on the host machine. When target is moved to the 
target machine, it should be installed at the location given 
in the specification file (see the #target directive in the sec- 
tion "Writing the Library Specification File"). If the -n 
option is given, then a new target shared library will not be 
generated. 

-h host Specifies the name of the host shared library, host. If this 

option is not given, then the host shared library will not be 
produced. 

— n Prevents a new target shared library from being generated. 

This option is useful when producing only a new host 
shared library. The — t option must still be supplied since a 
version of the target shared library is needed to build the 
host shared library. 

-L dir Changes the algorithm of searching for the host shared 

libraries specified with the #objects noload directive to look 
in dir before looking in the default directories. The — L 
option can be specified multiple times on the command line 
in which case the directories given with the — L options are 
searched in the order given on the command line before the 
default directories. 

— q Suppresses the printing of certain warning messages. 
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Guidelines for Writing Shared Library Code 

Because the main advantage of a shared library over an archive library 
is sharing and the space it saves, these guidelines stress ways to increase 
sharing while avoiding the disadvantages of a shared library. The guide- 
lines also stress upward compatibility. When appropriate, we describe our 
experience with building the shared C library to illustrate how following a 
particular guideline helped us. 

We recommend that you read these guidelines once from beginning to 
end to get a perspective of the things you need to consider when building a 
shared library. Then use it as a checklist to guide your planning and 
decision-making. 

Before we consider these guidelines, let's consider the restrictions to 
building a shared library common to all the guidelines. These restrictions 
involve static linking. Here's a summary of them, some of which are dis- 
cussed in more detail later. Keep them in mind when reading the guide- 
lines in this section: 

■ External symbols have fixed addresses. 

If an external symbol moves, you have to re-link all a.out files that 
use the library. This restriction applies both to text and data symbols. 

Use of the #hide directive to limit externally visible symbols can 
help avoid problems in this area. (See "Use #hide and #export to 
Limit Externally Visible Symbols" for more details). 

■ If the library's text changes for one process at run time, it changes for 
all processes. 

■ If the library uses a symbol directly, that symbol's run time value 
(address) must be known when the library is built. 

■ Imported symbols cannot be referenced directly. 

Their addresses are not known when you build the library, and they 
can be different for different processes. You can use imported sym- 
bols by adding an indirection through a pointer in the library's data. 
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Choosing Library Members 

Include Large, Frequently Used Routines 

These routines are prime candidates for sharing. Placing them in a 
shared library saves code space for individual a.out files and saves memory, 
too, when several concurrent processes need the same code. printf(3S) and 
related C library routines (which are documented in the Programmer's Refer- 
ence Manual) are good examples. 



When we built the shared C library... 

The printf(3S) family of routines is used frequently. 
Consequently, we included printf(3S) and related rou- 
tines in the shared C library. 



Exclude Infrequently Used Routines 

Putting these routines in a shared library can degrade performance, par- 
ticularly on paging systems. Traditional a.out files contain all code they 
need at run time. By definition, the code in an a.out file is (at least dis- 
tantly) related to the process. Therefore, if a process calls a function, it may 
already be in memory because of its proximity to other text in the process. 

If the function is in the shared library, a page fault may be more likely 
to occur, because the surrounding library code may be unrelated to the cal- 
ling process. Only rarely will any single a.out file use everything in the 
shared C library. If a shared library has unrelated functions, and unrelated 
processes make random calls to those functions, the locality of reference 
may be decreased. The decreased locality may cause more paging activity 
and, thereby, decrease performance. See also "Organize to Improve Local- 
ity." 
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When we built the shared C library... 

Our original shared C library had about 44 KB of text. 
After profiling the code in the library, we removed small 
routines that were not often used. The current library 
has under 29 KB of text. The point is that functions used 
only by a few a.out files do not save much disk space by 
being in a shared library, and their inclusion can cause 
more paging and decrease performance. 



Exclude Routines that Use Much Static Data 

These modules increase the size of processes. As "How Shared Libraries 
are Implemented" and "Deciding Whether to Use a Shared Library" explain, 
every process that uses a shared library gets its own private copy of the 
library's data, regardless of how much of the data is needed. Library data is 
static: it is not shared and cannot be loaded selectively with the provision 
that unreferenced pages may be removed from the working set. 

For example, getgrent(3C), which is documented in the Programmer's 
Reference Martual, is not used by many standard UNIX commands. Some 
versions of the module define over 1400 bytes of unshared, static data. It 
probably should not be included in a shared library. You can import global 
data, if necessary, but not local, static data. 

Exclude Routines that Complicate Maintenance 

All external symbols must remain at constant addresses. The branch 
table makes this easy for text symbols, but data symbols don't have an 
equivalent mechanism. The more data a library has, the more likely some 
of them will have to change size. Any change in the size of external data 
may affect symbol addresses and break compatibility. 

Include Routines the Library Itself Needs 

It usually pays to make the library self-contained. For example, 
print£(3S) requires much of the standard I/O library. A shared library con- 
taining printf(3S) should contain the rest of the standard I/O routines, too. 
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This guideline should not take priority over the others in this section. 
NOTE If you exclude some routine that the library itself needs based on a 
I previous guideline, consider leaving the symbol out of the library and 
I importing it. 



Changing Existing Code for tlie Shared Library 

All C code that works in a shared library v/ill also work in an archive 
library. However, the reverse is not true because a shared library must 
explicitly handle imported symbols. The following guidelines are meant to 
help you produce shared library code that is still valid for archive libraries 
(although it may be slightly bigger and slower). The guidelines explain 
how to structure data for ease of maintenance, since most compatibility 
problems involve restructuring data. 

Minimize Global Data 

All external data symbols are, of course, visible to applications. This 
can make maintenance difficult. You should try to reduce global data, as 
described below. 

First, try to use automatic (stack) variables. Don't use permanent storage 
if automatic variables work. Using automatic variables saves static data 
space and reduces the number of symbols visible to application processes. 

Second, see whether variables really must be external. Static symbols 
are not visible outside the library, so they may change addresses between 
library versions. Only external variables must remain constant. See the sec- 
tion "Use #hide and #export to Limit Externally Visible Symbols" later in 
this chapter for further tips. 

Third, allocate buffers at run time instead of defining them at compile 
time. This does two important things. It reduces the size of the library's 
data region for all processes and, therefore, saves memory; only the 
processes that actually need the buffers get them. It also allows the size of 
the buffer to change from one release to the next without affecting compati- 
bility. Statically allocated buffers cannot change size without affecting the 
addresses of other symbols and, perhaps, breaking compatibility. 
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Define Text and Giobal Data in Separate Source Files 

Separating text from global data makes it easier to prevent data symbols 
from moving. If new external variables are needed, they can be added at 
the end of the old definitions to preserve the old symbols' addresses. 

Archive libraries let the link editor extract individual members. This 
sometimes encourages programmers to define related variables and text in 
the same source file. This works fine for relocatable files, but shared 
libraries have a different set of restrictions. Suppose external variables were 
scattered throughout the library modules. Then external and static data 
would be intermixed. Changing static data, such as a string, like hello in 
the following example, moves subsequent data symbols, even the external 
symbols: 



Before Broken Successor 

int head = 0; int head = 0; 

fvmcO funcO 

i { 

p = "hello"; p = "hello, world"; 

> } 

int tail = 0; int tail = 0; 



Assume the relative virtual address of head is for both examples. The 
string literals will have the same address too, but they have different 
lengths. The old and new addresses of tail thus might be 12 and 20, respec- 
tively. If tail is supposed to be visible outside the library, the two versions 
will not be compatible. 
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The compilation system sometimes defines and uses static data invisi- 
NOTE bly to the user (e.g. tables for switch statements). Therefore, it is a 
I mistake to assume that because you declare no static data in your 
I shared library that you can ignore the guideline in this section. 



Adding new external variables to a shared library may change the 
addresses of static symbols, but this doesn't affect compatibility. An a.out 
file has no way to reference static library symbols directly, so it cannot 
depend on their values. Thus it pays to group all external data symbols and 
place them at lower addresses than the static (hidden) data. You can write 
the specification file to control this. In the list of object files, make the glo- 
bal data files first. 



#objects 
datal.o 



lastdata.o 

textl.o 

text2.o 




If the data modules are not first, a seemingly harmless change (such as a 
new string literal) can break existing a.out files. 

Shared library users get all library data at run time, regardless of the 
source file organization. Consequently, you can put all external variables' 
definitions in a single source file without a space penalty. 

initialize Global Data 

Initialize external variables, including the pointers for imported sym- 
bols. Although this uses more disk space in the target shared library, the 
expansion is limited to a single file, mkshlib will give a fatal error if it 
finds an uninitialized external symbol. 
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Using the Specification File for Compatibility 

The way in which you use the directives in the specification file can 
affect compatibility across versions of a shared library. This section gives 
some guidelines on how to use the directives #branch, #hide, and #export. 
Preserve Branch Table Order 

You should add new functions only at the end of the branch table. 
After you have a specification file for the library, try to maintain compatibil- 
ity with previous versions. You may add new functions without breaking 
old a.out files as long as previous assignments are not changed. This lets 
you distribute a new library without having to re-link all of the a.out files 
that used a previous version of the library. 

Use #hide and #export to Limit Externally Visible Symbols 

Sometimes variables (or functions) must be referenced from several 
object files to be included in the shared library and yet are not intended to 
be available to users of the shared library. That is, they must be external so 
that the link editor can properly resolve all references to symbols and create 
the target shared library, but should be hidden from the user's view to 
prevent their use. Such unintended and unwanted use may result in com- 
patibility problems if the symbols move or are removed between versions of 
the shared library. 

The #hide and #export directives are the key to resolving this 
dilemma. The #hide directive causes mkshlib, after resolving all references 
within the shared library, to alter the symbol tables of the shared library so 
that all specified external symbols are made static and unaccessible from 
user code. You can specify the symbols to be so treated individually and /or 
through the use of regular expressions. 

The #export directive allows you to specify those symbols in the range 
of an accompanying #hide directive regular expression which should 
remain external. It is simply a convenience. 



NOTE 



It is a fatal error to try to explicitly name the same symbol in a #hide and 
an #export directive. For example, the following would result in a fatal 
error. 
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#hide linker 
one 

#e3qx:5rt linker 
one 



#export may seem like an unnecessary feature since you could avoid 
specifying in the #hide directive those symbols that you do not want to be 
made static. However, its usefulness becomes apparent when the shared 
library to be built is complicated and there are many symbols to be made 
static. In these cases, it is more eJBEicient to use regular expressions to make 
all external variables static and individually list those symbols you need to 
be external. The simple example in the section "Writing the Library 
Specification File*' demonstrates this point. 



NOTE 



Symbols mentioned in the #branch and #init directives are services of 
the shared library, must be external symbols, and cannot be made static 
through the use of these directives. 



When we built the shared C library... 

Our approach for the shared C library was to hide all 
data symbols by default, and then explicitly export sym- 
bols that we knew were needed. The advantage of this 
approach is that future changes to the libraries won't 
introduce new external symbols (possibly causing name 
collisions) unless we explicitly export the new symbols. 

We chose the symbols to export by looking at a list of all 
the current external symbols in the shared C library and 
finding out what each symbol was used for. The sym- 
bols that were global but were only used in the shared C 
library were not exported; these symbols will be hidden 
from applications code. All other symbols were expli- 
citly exported. 
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Importing Symbols 

Normally, shared library code cannot directly use symbols defined out- 
side a library, but an escape hatch exists. You can define pointers in the 
data area and arrange for those pointers to be initialized to the addresses of 
imported symbols. Library code then accesses imported symbols indirectly, 
delaying symbol binding until run time. Libraries can import both text and 
data symbols. Moreover, imported symbols can come from the user's code, 
another library, or even the library itself. In Figure 8-4, the symbols 
Jibc.ptrl and Jibc,ptr2 are imported from user's code and the symbol 
_libc_malloc from the library itself. 

Shared Library a.out File 



nallocC ) 



libc ptr2 



Figure 8-4: Imported Symbols in a Shared Library 




The following guidelines describe when and how to use imported sym- 
bols. 

imported Symbois that tiie Library Does Not Define 

Archive libraries typically contain relocatable files, which allow 
undefined references. Although the host shared library is an archive, too, 
that archive is constructed to mirror the target library, which more closely 
resembles an a.out file. Neither target shared libraries nor a.out files can 
have unresolved references to symbols. 
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Consequently, shared libraries must import any symbols they use but do 
not define. Some shared libraries will derive from existing archive libraries. 
For the reasons stated above, it may not be appropriate to include all the 
archive's modules in the target shared library. Remember though that if 
you exclude a symbol from the target shared library that is referenced from 
the target shared library, you will have to import the excluded symbol. 

Imported Symbols that Users Must Be Able to Redefine 

Optionally, shared libraries can import their own symbols. At first this 
might appear to be an unnecessary complication, but consider the follow- 
ing. Two standard libraries, libc and libmalloc, provide a malice family. 
Even though most UNIX commands use the malice from the C library, they 
can choose either library or define their own. 
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When we built the shared C library... 

Three possible strategies existed for the shared C library. 
First, we could have excluded the malloc(3C) family. 
Other library members would have needed it, and so it 
would have been an imported symbol. This would have 
worked, but it would have meant less savings. 

Second, we could have included the malice family and 
not imported it. This would have given us more savings 
for typical commands, but it had a price. Other library 
routines call malice directly, and those calls could not 
have been overridden. If an application tried to redefine 
malice, the library calls would not have used the alter- 
nate version. Furthermore, the link editor would have 
found multiple definitions of malice while building the 
application. To resolve this the library developer would 
have to change source code to remove the custom mal- 
ice, or the developer would have to refrain from using 
the shared library. 

Finally, we could have included malice in the shared 
library, treating it as an imported symbol. This is what 
we did. Even though malice is in the library, nothing 
else there refers to it directly; all references are through 
an imported symbol pointer. If the application does not 
redefine malice, both application and library calls are 
routed to the library version. All calls are mapped to the 
alternate, if present. 



You might want to permit redefinition of all library symbols in some 
libraries. You can do this by importing all symbols the library defines, in 
addition to those it uses but does not define. Although this adds a little 
space and time overhead to the library, the technique allows a shared 
library to be one hundred percent compatible with an existing archive at 
link time and run time. 
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Mechanics of Importing Symbols 

Let's assume a shared library wants to import the symbol malloc. The 
original archive code and the shared library code appear below. 



Archive Code 



extern char «nalloc( ) ; 

e«5Xirt( ) 
{ 

p = rnalloc(n) ; 



Shared Library Code 

/♦ See pointers. c on next page */ 

eictem char ♦{♦_libc_mlloc) ( ) ; 

e}qport( ) 
{ 

p = (♦ libc iialloc)(n); 



Making this transformation is straightforward, but two sets of source 
code would be necessary to support both an archive and a shared library. 
Some simple macro definitions can hide the transformations and allow 
source code compatibility. A header file defines the macros, and a different 
version of this header file would exist for each type of library. The —I flag 
to cc(l) would direct the C preprocessor to look in the appropriate directory 
to find the desired file. 
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Archive import.h Shared import.h 

/♦ anp t y ♦/ /* 

♦ Macros f C5r iicporting 

* symbols. One #define 
» per symbol. 

♦/ 



#aefine malloc (*_^labc^malloc) 
extern char *nialloc( ) ; 



These header files allow one source both to serve the original archive 
source and to serve a shared library, too, because they supply the indirec- 
tions for imported symbols. The declaration of malloc in import.h actually 
declares the pointer _libc_malloc. 



Common Source 
#include "iirport.h" 

ejctem char *nalloc(); 

e:qport( ) 
{ 

p = iiQlloc(n); 

} 
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Alternatively, one can hide the #include with #ifdef : 



Common Source 

#ifdef SHLIB 
# include "i i rpor t .h" 
#endif 

extern char *malloc( ) ; 

e:qport{ ) 
{ 

p = inalloc(n) ; 

} 

V 

Of course the transformation is not complete. You must define the 
pointer libc^malloc. 

File pointers.c 

dhar *(*_libc_iralloc) ( ) = 0; 

Note that _libc_maUoc is initialized to zero, because it is an external 
data symbol. 

Special initialization code sets the pointers. Shared library code should 
not use the pointer before it contains the correct value. In the example the 
address of malice must be assigned to _libc_malloc. Tools that build the 
shared library generate the initialization code according to the library 
specification file. 

Pointer Initialization Fragments 

A host shared library archive member can define one or many imported 
symbol pointers. Regardless of the number, every imported symbol pointer 
should have initialization code. 
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This code goes into the a.out file and does two things. First, it creates 
an unresolved reference to make sure the symbol being imported gets 
resolved. Second, initialization fragments set the imported symbol pointers 
to their values before the process reaches main. If the imported symbol 
pointer can be used at run time, the imported symbol will be present, and 
the imported symbol pointer will be set properly. 



NOTE 



Initialization fragments reside in the host, not the target, shared library. 
The link editor copies initialization code into a.out files to set imported 
pointers to their correct values. 



Library specification files describe how to initialize the imported symbol 
pointers. For example, the following specification line would set 
_libc_malloc to the address of malloc: 

#init pnalloc.o 
_libc_iralloc malice 

When mkshlib builds the host library, it modifies the file pmalloco, 
adding relocatable code to perform the following assignment statement: 

libc mlloc = Smlloc; 

When the link editor extracts pmalloc.o from the host library, the relo- 
catable code goes into the a.out file. As the link editor builds the final a.out 
file, it resolves the unresolved references and collects all initialization frag- 
ments. When the a.out file is executed, the run time startup routines exe- 
cute the initialization fragments to set the library pointers. 

Selectiveiy Loading Imported Symbols 

Defining fewer pointers in each archive member increases the granular- 
ity of symbol selection and can prevent unnecessary objects and initializa- 
tion code from being linked into the a.out file. For example, if an archive 
member defines three pointers to imported symbols, the link editor will 
require definitions for all three symbols, even though only one might be 
needed. 
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You can reduce unnecessary loading by writing C source files that 
define imported symbol pointers singly or in related groups. If an imported 
symbol must be individually selectable, put its pointer in its own source file 
(and archive member). This will give the link editor a finer granularity to 
use when it resolves the reference to the symbol. 

Let's look at an example. In the coarse method, a single source file 
might define all pointers to imported symbols: 



Old pointers.c 

int (*_libc_ptr1)() = 0; 
char *(*_libc_malloc)() = 0; 
int (* libc ptr2)() = 0; 

vL__ 

Allowing the loader to resolve only those references that are needed 
requires multiple source files and archive members. Each of the new files 
defines a single pointer: 



File 


Contents 


ptrl.c 


int (*_libc_ptr1)() = 0; 


pmalloc.c 


char *(*_libc_malloc)() = 0; 


ptr2.c 


int (♦_libc_ptr2)() = 0; 



Using the three files ensures that the link editor will only look for 
definitions for imported symbols and load in the corresponding initializa- 
tion code in cases where the symbols are actually used. 

Referencing Symbols in a Shared Library from Another Shared 
Library 

At the beginning of the section "Importing Symbols/' there was a state- 
ment that "normally shared libraries cannot directly use symbols defined 
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outside the shared library." This is true in general, and you should import 
all symbols defined outside the shared library whenever possible. 

Unfortunately, this is not always possible, for example when floating- 
point operations are performed in a shared library to be built. When such 
operations are encountered in any C code, the standard C compiler gen- 
erates calls to functions to perform the actual operations. These functions 
are defined in the C library and are normally resolved invisibly to the user 
when an a.out is created since the cc command automatically causes the 
relocatable version of the C library to be searched. When building a shared 
library, these floating-point routine references must be resolved at the time 
the shared library is being built. But, the symbols cannot be imported 
because their names and usage are invisible. 

The #objects noload directive has been provided to allow symbol refer- 
ences such as these to be resolved at the time the shared library is built, 
provided that the symbols are defined in another shared library. If there 
are unresolved references to symbols after the object files listed with the 
#objects directive have been link edited, the host shared libraries specified 
with the #objects noload directive are searched for absolute definitions of 
the symbols. The normal use of the directive would be to search the shared 
version of the C library to resolve references to floating-point routines. 

For this use, the syntax in the specification file would be; 

#objects TX)load 
— Ic s 

This would cause mkshlib to search for the host shared library libc_s.a in 
the default library locations and to use it to resolve references to any sym- 
bols left unresolved in the shared library being built. The -L option can be 
used to cause mkshlib to look for the specified library in other than the 
default locations. 

A few notes on usage are in order. When building a shared library 
using #objects noload you must make sure that for each symbol with an 
unresolved reference there is a version of the symbol with an absolute 
definition in the searched host shared libraries before any relocatable ver- 
sion of that symbol, mkshlib will give a fatal error if this is not the case 
because relocatable definitions do not have absolute addresses and therefore 
do not allow complete resolution of the target shared library. 
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When using a shared library built with references to symbols resolved 
from another shared library, both libraries must be specified on the cc com- 
mand line. The dependent library must be specified on the command line 
before the libraries on which it depends, (See the section "Building an 
a.out File" for more details.) If you provide a shared library which refer- 
ences symbols in another shared library, you should make sure that your 
documentation clearly states that users must specify both libraries when 
building a.out files. 

Finally, as some of the text above hints, it is possible to use #objects 
noload to resolve references to any symbols not defined in a shared library 
as long as they are defined in some other shared library. We strongly 
encourage you to import as many symbols as possible and to use #objects 
noload only when absolutely necessary. Probably you will only need to use 
this feature to resolve references to floating-point routines generated by the 
C compiler. 

Importing symbols has several important benefits over resolving refer- 
ences through #objects noload. First, importing symbols is more flexible in 
that it allows you to define your own version of library routines. You can 
define your own versions with archive versions of a library. Preserving this 
ability with the shared versions helps maintain compatibility. 

Importing symbols also helps prevent unexpected name space collisions. 
The link editor will complain about multiple definitions of a symbol, refer- 
ences to which are resolved through the #objects noload mechanism, if a 
user of the shared library also has an external definition of the symbol. 

Finally, #objects noload has the drawback that both the library you 
build and all the libraries on which it depends must be available on all the 
systems. Anyone who wishes to create a.out files using your shared library 
will need to use the host shared libraries. Also, the targets of all the 
libraries must be available on all systems on which the a.out files are to be 
run. 

Providing Archive Library Compatibility 

Having compatible libraries makes it easy to substitute one for the 
other. In almost all cases, this can be done without makefile or source file 
changes. Perhaps the best way to explain this guideline is by example: 
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When we built the shared C library... 

We had an existing archive library to use as the base. 
This obviously gave us code for individual routines, and 
the archive library also gave us a model to use for the 
shared library itself. 

We wanted the host library archive file to be compatible 
with the relocatable archive C library. However, we did 
not want the shared library target file to include all rou- 
tines from the archive: including them all would have 
hurt performance. 

Reaching these goals was, perhaps, easier than you 
might think. We did it by building the host library in 
two steps. First, we used the available shared library 
tools to create the host library to match exactly the tar- 
get. The resulting archive file was not compatible with 
the archive C library at this point. Second, we added to 
the host library the set of relocatable objects residing in 
the archive C library that were missing from the host 
library. Although this set is not in the shared library 
target, its inclusion in the host library makes the relocat- 
able and shared C libraries compatible. 



Tuning the Shared Library Code 

Some suggestions for how to organize shared library code to improve 
performance are presented here. They apply to paging systems, such as 
UNIX System V Release 3.0. The suggestions come from the experience of 
building the shared C library. 

The archive C library contains several diverse groups of functions. 
Many processes use dififerent combinations of these groups, making the pag- 
ing behavior of any shared C library difficult to predict. A shared library 
should offer greater benefits for more homogeneous collections of code. For 
example, a data base library probably could be organized to reduce system 
paging substantially, if its static and dynamic calling dependencies were 
more predictable. 
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Profile the Code 

To begin, profile the code that might go into the shared library. 

Choose Library Contents 

Based on profiling information, make some decisions about what to 
include in the shared library, a.out file size is a static property, and paging 
is a dynamic property. These static and dynamic characteristics may 
conflict, so you have to decide whether the performance lost is worth the 
disk space gained. See "Choosing Library Members" in this chapter for 
more information. 

Organize to improve Locality 

When a function is in a.out files, it probably resides in a page with 
other code that is used more often (see "Exclude Infrequently Used Rou- 
tines"). Try to improve locality of reference by grouping dynamically 
related functions. If every call of funcA generates calls to funcB and funcC, 
try to put them in the same page, cflow(l) (documented in the Programmer's 
Reference Manual) generates this static dependency information. Combine it 
with profiling to see what things actually are called, as opposed to what 
things might be called. 

Align for Paging 

The key is to arrange the shared library target's object files so that fre- 
quently used functions do not unnecessarily cross page boundaries. When 
arranging object files within the target library, be sure to keep the text and 
data files separate. You can reorder text object files without breaking com- 
patibility; the same is not true for object files that define global data. Once 
again, an example might best explain this guideline: 
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When we built the shared C library,,. 

We used a 3B2 Computer to build the library; the archi- 
tecture of the 3B2 Computer uses 2 KB pages. Using 
name lists and disassemblies of the shared library target 
file, we determined where the page boundaries fell. 

After grouping related functions, we broke them into 
page-sized chunks. Although some object files and func- 
tions are larger than a single page, most of them are 
smaller. Then we used the infrequently called functions 
as glue between the chunks. Because the glue between 
pages is referenced less frequently than the page con- 
tents, the probability of a page fault decreased. 

After determining the branch table, we rearranged the 
library's object files without breaking compatibility. We 
put frequently used, unrelated functions together, 
because we figured they would be called randomly 
enough to keep the pages in memory. System calls went 
into another page as a group, and so on. The following 
example shows how to change the order of the library's 
object files: 

Before After 
#Qbjects ^objects 



priiitf .o 
fopen.o 
nalloc.o 
stxcn^.o 



stxaup.o 
rnalloco 
printf .o 
fopen.o 
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Avoid Hardware Thrashting 

Finally, you may have to consider the hardware you're using to obtain 
better performance. Using the 3B2 Computer, for example, you need to 
consider its memory management. Part of the memory management 
hardware is an 8-entry cache for translating virtual to physical addresses. 
Each segment (128 KB) is mapped to one of the eight entries. Consequently, 
segments 0, 8, 16, ... use entry 0; segments 1, 9, 17, ... use entry 1; and so on. 

You get better performance by arranging the typical process to avoid 
cache entry conflicts. If a heavily used library had both its text and its data 
segment mapped to the same cache entry, the performance penalty would 
be particularly severe. Every library instruction would bring the text seg- 
ment information into the cache. Instructions that referenced data would 
flush the entry to load the data segment. Of course, the next instruction 
would reference text and flush the cache entry, again. 



When we built the shared C library... 

We avoided the cache entry conflicts. At least with the 
3B2 Computer architecture, a library's text and data seg- 
ment numbers should differ by something other than 
eight. 



Checking for Compatibility 

The following guidelines explain how to check for upward-compatible 
shared libraries. Note, however, that upward compatibility may not always 
be an issue. Consider the case in which a shared library is one piece of a 
larger system and is not delivered as a separate product. In this restricted 
case, you can identify all a.out files that use a particular library. As long as 
you rebuild all the a.out files every time the library changes, the a.out files 
will run successfully, even though versions of the library are not compati- 
ble. This may complicate development, but it is possible. 

Checking Versions of Shared Libraries Using chkshlib(l) 

Shared library developers normally want newer versions of a library to 
be compatible with previous ones. As mentioned before, a.out files will not 
execute properly otherwise. 
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If you use shared libraries, you might need to find out if different ver- 
sions of a shared library are compatible, or if executable files could have 
been built with a particular host shared library or can run with a particular 
target shared library. For example, you might have a new version of a tar- 
get shared library, and you need to know if all the executable files that ran 
with the older version will run with the new one. You might need to find 
out if a particular target shared library can reference symbols in another 
shared library. A command, chkshlib(l), has been provided to allow you to 
do these and other comparisons. 

chkshlib takes names of target shared libraries, host shared libraries, 
and executable files as input, and checks to see if those files satisfy the com- 
patibility criteria, chkshlib checks to see if every library symbol in the first 
file that needs to be matched exists in the second file and has the same 
address. The following table shows what types of files and how many of 
them chkshlib accepts as input. 

The rows listed down represent the first input given, and the columns 
listed across represent any more inputs given. For example, if the first 
input file you give chkshlib is a target shared library, you must give 
another input file that is a target or host shared library. 





Nothing 


Executable 


Target 


Host 


Executable 


OK 


No 


OK* 


OK* 


Target 


No 


No 


OK 


OK 


Host 


OK 


No 


OK 


OK 



* The executable file must be one that was built using a host shared library, 
A useful way to confirm this is to use dump -L to find out which 
target file(s) gets loaded when the program is run. See duinp(l). 

• You can also have executable targetl,.Jargetti and executable host1...hostn. 

An example of a chkshlib command line is shown below: 

chkshlib /shlib/libc_s /lib/libc_s.a 

In this example, /shlib/libc_s is a target shared library and /lib/libc_s.a is a 
host shared library, chkshlib will check to see if executable files built with 
/shlib/libc_s.a would be able to run with /lib/libcjs. 
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Depending on the input it receives, chkshlib checks to find out if: 

■ an executable file will run with the given target shared library 

■ an executable file could have been built using the given host shared 
library 

■ an executable file produced with a given host shared library will run 
with a given target shared library 

■ an executable file that ran with an old version of a target shared 
library will run with a new version 

■ a new host shared library can replace the old host shared library; that 
is, executable files built with the new host shared library will run 
with the old target shared library 

■ a target shared library can reference symbols in another target shared 
library 



To determine if files are compatible, you have to determine which 
library symbols in the first file need to be matched in the second file. 

■ For target shared libraries, the symbols of concern are all external, 
defined symbols with non-zero values, except for branch labels 
(branch labels always start with .bt), and the special symbols etext, 
edata, and end. 

■ For host shared libraries, the symbols of concern are all external 
absolute symbols with a non-zero value. 

■ For executable files, the symbols of concern are all external absolute 
symbols with a non-zero value except for the special symbols etext, 
edata, and end. 

For two files to be compatible, the target pathnames must be identical in 
both files (unless the — i option has been specified). 

The following table displays the output you will receive when you use 
chkshlib to check different combinations of files for compatibility. In this 
table filel represents the name of the first file given, and file2,3..« represents 
the names of any more files given as input. 
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Input 


Output 


filel is executable 
Hle2,3,... (if any) are targets 


filel can [may not] execute using file2 
filel can [may not] execute using fileS 


filel is executable 
file2,3 are hosts 


filel may [may not] have been produced using file2 
filel may [may not] have been produced using file3 


filel is host 

file2 (if any) is target 


filel can [may not] produce executables which 
will run with file2 


filel is target 
file2 is host 


file2 can [may not] produce executables which 
will run with filel 


both files are targets or 
both files are hosts 


filel can [may not] replace file2 
file2 can [may not] replace filel 


both files are targets and 
— n option is specified* 


filel can [may not] include file2 



• The — n option tells chkshlib that the two files are target shared libraries, 
the first of which can reference (include) symbols in the other. See 
"Referencing Another Shared Library Within A Shared Library" for details. 

For more information on chkshlib, see chkshlib(l). 
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When we built the shared C library.,. 

When we built the second version of the shared C 
library and checked it against the first version, chkshlib 
reported that many external symbols had different values 
and therefore the second version could not replace the 
first. Here is a list of these symbols: 

_bigpow _qnanl.single 
Jitpow _qnan2.double 



Since these text symbols were not intended to be user 
entry points, they were not put in the branch table. So 
when new code was added to the shared library the 
addresses of these text symbols changed, and hence their 
values changed. 

We devised the #hide and #export directives to 
allow us to explicitly hide the symbols we did not want 
to be user entry points. In fact, in the latest C Shared 
Library we hid all the symbols, and exported just the 
ones we want to be user entry points. 

You cannot directly reference these functions, and 
these symbols will not be considered incompatible by 
chkshlib in checking the latest version of the shared C 
library with any subsequent version. 



infl.double 
infLsingle 
.inf2.double 
jnf2.single 



_qnan2.single 
round.double 



round.single 
_trap.single 
type.double 
type.single 



invalid.double 

invalid.single 

qnanl.double 



SHARED LIBRARIES 8-49 



Building a Siiared Library 



Dealing with Incompatible Libraries 

When you determine that a newer version of a library can't replace the 
older version, you have to deal with the incompatibility. You can deal with 
it in one of two ways. First, you can rebuild all the a.out files that use your 
library. If feasible, this is probably the best choice. Unfortunately, you 
might not be able to find those a.out files, let alone force their owners to 
rebuild them with your new library. 

So your second choice is to give a different target pathname to the new 
version of the library. The host and target pathnames are independent; so 
you don't have to change the host library pathname. New a.out files will 
use your new target library, but old a.out files will continue to access the 
old library. 

As the library developer, it is your responsibility to check for compati- 
bility and, probably, to provide a new target library pathname for a new 
version of a library that is incompatible with older versions. If you fail to 
resolve compatibility problems, a.out files that use your library will not 
work properly. 



NOTE 



You should try to avoid multiple library versions. If too many copies of 
the same shared library exist, they might actually use more disk space and 
more memory than the equivalent relocatable version would have. 



An Example 

This section contains an example of a small specialized shared library 
and the process by which it is created from original source and built. We 
refer to the guidelines given earlier in this chapter. 

The Original Source 

The name of the library to be built is libmaux (for math auxiliary 
library). The interface consists of three functions: 

logd floating-point logarithm to a given base; defined in the file 

log-c 
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polyd 



evaluate a polynomial; defined in the file poly.c 



maux stat return usage counts for the other two routines in a structure; 
defined in stats.c, 

an external variable: 

mauxerr set to non-zero if there is an error in the processing of any 
of the functions in the library and set to zero if there is no 
error (unlike errno in the C library), 

and a header file: 

maux.h declares the return types of the function and the structure 
returned by maux stat. 

The source files before any modifications for inclusion in a shared 
library are given below. 
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/♦ log.c ♦/ 
#include "mux.h" 
#iiiclude <m1:h.h> 

/* 

» Return the log of "x" relative to the base "a" . 



* vdiere "log" is "log to the base E". 
#/ 

double logd(base, x) 
dcwble base, x; 
{ 



extern int stats^logd; 
extern int total^calls; 

double log^sase; 
double logx; 

total_calls++; 
stats^logd++; 

log^se = log( (double )base); 

logx = log( (double )x); 

if (logbase == -HUGE 1 1 logx == -HUGE) { 



logd(base, x) := log(x) / log(base); 



raauxerr 



retum(O) ; 



else 

nauxerr » 0; 
retumdogx/log^base) ; 





Figure 8-5: File log.c 
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/* poly.c «/ 

#include "mux.h" 
#iiicl\xte <nath.h> 

/* Evaluate the polynonial 

* f(x) := a[0] ♦ (x n) + aCI] * (x ^ (n-D) + ... + a[n]; 

* Note that there are K+1 coefficients I 

* ISrLs uses Homer's Method, vftiich is: 

* f(x) := (((((a[0]*x) + a[1])*x) + a[2]) + ...) + a[n]; 

* It's equivalent, but uses nary less operations and is roore precise. */ 

double polyd(a, n, x) 

double a[]; 

int n; 

double x; 

{ 

extern int stats^polyd; 
extern int total calls; 
double result; 
int i; 

total^oalls+-*-; 
stats^polyd+-»-; 
if (n"< 0) { 

muxerr s: 1; 

retum(O); 

} 

result « a[0]; 

for (i = 1; i <s n; i++) 

{ 

result *= { double )x; 
result += (double)a[i]; 

} 

mauxerr = 0; 
retum( result) ; 



Figure 8-6: File poly.c 
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/* stats. c »/ 
#include "maux.h" 

int total^calls = 0; 
ant stats_logd = 0; 
int stats^polyd = 0; 

int nauxerr; 

/* Return structure vdth usage stats for fmictiGnis in library 

* car if space cannot be allocated for the structure */ 
struct mstats * 
inaux_stat( ) 
{ 



extern char * iialloc( ) ; 
struct mstats * st; 

if((st = (struct mstats *) nalloc(sizeof {struct mstats))) == 0) 

retum(O) ; 
st->st_polyd = stats_polyd; 
st->st_logcl = stats_logd; 
st->st_total = total^calls; 
retum(st) ; 



} 




Figure 8-7: File stats.c 
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/♦ naijx.h «/ 



stzuct mstats { 

int St polyd; 
int st_logd; 
int St total; 

}; 

extern double polyd( ) ; 

extern double logcK ) ; 

extern struct cnstats « naux stat( ) ; 



extern int nauxerr; 




Figure 8-8: Header File maux.h 



Choosing Region Addresses and the Target Pathname 

To begin, we choose the region addresses for the library's .text and .data 
sections from the segments reserved for private use on the 3B2 Computer; 
note that the region addresses must be on a segment boundary (128K): 

.text 0x80680000 
•data Ox806aOOOO 

Also we choose the pathname for our target library: 

/in//directo(ry/li]3naux_s 

Seiecting Library Contents 

This example is for illustration purposes and so we will include every- 
thing in the shared library. In a real case, it is unlikely that you would 
make a shared library with these three small routines unless you had many 
programmers using them frequently. 
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Rewriting Existing Code 

According to the guidelines given earlier in the chapter, we need to 
first minimize the global data. We realize that totaljcalls, statsjogd, and 
stats^polyd do not need to be visible outside the library, but are needed in 
multiple files within the library. Hence, we will use the #hide directive in 
our specification file to make these variables static after the shared library is 
built. 

We need to define text and global data in separate source files. The 
only piece of global data we have left is mauxerr, which we will remove 
from stats.c and put in a new file mauxjdefs.c. We will also have to initial- 
ize it to zero since shared libraries cannot have any uninitialized variables. 

Next, we notice that there are some references to symbols that we do 
not define in our shared library (i.e. log and malloc). We can import these 
symbols. To do so, we create a new header file, import.h, which will be 
included in each of log.c, poly.c, and stats.c. The header file defines C 
preprocessor macros for these symbols to make transparent the use of 
indirection in the actual C source files. We use the _libmaux_ prefixes on 
the pointers to the symbols because those pointers are made external, and 
the use of the library name as a prefix helps prevent name conflicts. 



/» New header file i r qp or t.h */ 
#clefiiie malloc (*_libnca3x malloc) 
#defi2ie log (*_lilx[aux"log) 

eictem char « iib11oc( ) ; 
extern double log( ) ; 



Now, we need to define the imported symbol pointers somewhere. We 
have already created a file for global data maux_defs.c, so we will add the 
definitions to it. 
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/* Data file inaijxjaefs.c ♦/ 

int muxerr - 0; 

double (♦_lilinaux_log)() = 0; 

char » {* lilmaux mlloc)() = 0; 



Finally, we observe that there are floating-point operations in the code 
and we remember that the routines for these cannot be imported. (If we 
tried to write the specification file and build the shared library without tak- 
ing this into account, mkshlib would give us errors about unresolved refer- 
ences.) This means we will have to use the #ob|ects noload directive in 
our specification file to search the C host shared library to resolve the refer- 
ences. 



Writing the Specification File 

This is the specification file for libmaux: 
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1 m 

2 ## laJanaux.sl - liJanaux specification Ifile 

3 #address ,text 0x80680000 

4 #address .data Qx806a0000 

5 #taxget /ny/direc±oary/libnau>c s 

6 toranch 

7 polyd 1 

8 logd 2 

9 mux_stat 3 

10 #objec±s 

11 ]xa\sc_de£s.o 

12 poly.o 

13 log.o 

14 stats. o 

15 ^objects iK?load 

16 -lc_s 

17 4«iide liiiker • 

18 #expoQrt linker 

19 mauxerr 

20 #iiiit iraux_defs.o 

21 _litnBux_iialloc iibIIoc 

22 _libmaux_log log 



Figure 8-9: Specification File 



Briefly, here is what the specification file does. Lines 1 and 2 are com- 
ment lines. Lines 3 and 4 give the virtual addresses for the shared library 
text and data regions, respectively. Line 5 gives the pathname of the shared 
library on the target machine. The target shared library must be installed 
there for a.out files that use it to work correctly. Line 6 contains the 
#branch directive. Line 7 through 9 specify the branch table. They assign 
the functions polydO, logdO, and maux_stat() to branch table entries 1, 2, 
and 3. Only external text symbols, such as C functions, should be placed in 
the branch table. 
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Line 10 contains the #objects directive. Lines 11 through 14 give the 
list of object files that will be used to construct the host and target shared 
libraries. When building the host shared library archive, each file listed 
here will reside in its own archive member. When building the target 
library, the order of object files will be preserved. The data files must be 
first. Otherwise, an addition of static data to poly.o, for example, would 
move external data symbols and break compatibility. 

Line 15 contains the #objects noload directive, and line 16 gives infor- 
mation about where to resolve the references to the floating-point routines. 

Lines 17 through 19 contain the #hide linker and #export linker direc- 
tives, which tell what external symbols are to be left external after the 
shared library is built. Together, these #hide and #export directives say 
that only mauxerr will remain external. The symbols in the branch table 
and those specified in the #init directive will remain external by definition. 

Line 20 contains the #init directive. Lines 21 and 22 give imported 
symbol information for the object file maux_defs.o. You can imagine 
assignments of the symbol values on the right to the symbols on the left. 
Thus Jibmaux will hold a pointer to malice, and so on. 

Building the Shared Library 

Now, we have to compile the .o files as we would for any other library: 

cc — c maux_defs.c poly.c log.c stats.c 

Next, we need to invoke mkshlib to build our host and target libraries: 

mkshlib -s libmaux.sl -t libmaux_s -h libmauxjs.a 

Presuming all of the source files have been compiled appropriately, the 
mkshlib command line shown above will create both the host library, 
Iibmaux_s.a, and the target library, Iibmaux_s. Before any a.out files built 
with libmaux_s.a can be executed, the target shared library libmaux s will 
have to be moved to /my/ directory /libmauxs as specified in the 
specification file. 

Using the Shared Library 

To use the shared library with a file x.c which contains a reference to 
one or more of the routines in libmaux, you would issue the following 
command line: 

cc x,c libmaux j5.a -Im -lc_s 



SHARED LIBRARIES 8-59 



Building a Shared Library 



This command line causes: 

■ the imported symbol pointer reference to log to be resolved from 
libm and 

■ the imported symbol pointer reference to malloc to be resolved with 
the shared version from libcjs. 

The most important thing to note from the command line, however, is that 
you have to specify the C host shared library (in this case with the — lc_s) 
on the command line since libmaux was built with direct references to the 
floating-point routines in that library. 



8-60 PROGRAMMER'S GUIDE 



Summary 

This chapter describes the UNIX system shared libraries and explains 
how to use them. It also explains how to build your own shared libraries. 
Using any shared library almost always saves disk storage space, memory, 
and computer power; and running the UNIX system on smaller machines 
makes the efficient use of these resources increasingly important. Therefore, 
you should normally use a shared library whenever it's available. 
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Introduction 



The UNIX system supports three types of Inter-Process Communication 
(IPC): 

■ messages 

■ semaphores 

■ shared memory 

This chapter describes the system calls for each type of IPC. 

Included in the chapter are several example programs that show the use 
of the IPC system calls. All of the example programs have been compiled 
and run on an AT&T 3B2 Computer. 

Since there are many ways in the C Programming Language to accom- 
plish the same task or requirement, keep in mind that the example pro- 
grams were written for clarity and not for program efficiency. Usually, sys- 
tem calls are embedded within a larger user-written program that makes use 
of a particular function that the calls provide. 
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The message type of IPC allows processes (executing programs) to com- 
municate through the exchange of data stored in buffers. This data is 
transmitted between processes in discrete portions called messages. 
Processes using this type of IPC can perform two operations: 

■ sending 

■ receiving 

Before a message can be sent or received by a process, a process must 
have the UNIX operating system generate the necessary software mechan- 
isms to handle these operations. A process does this by using the msgget(2) 
system call. While doing this, the process becomes the owner/creator of the 
message facility and specifies the initial operation permissions for all other 
processes, including itself. Subsequently, the owner /creator can relinquish 
ownership or change the operation permissions using the msgctl(2) system 
call. However, the creator remains the creator as long as the facility exists. 
Other processes with permission can use msgctl() to perform various other 
control functions. 

Processes which have permission and are attempting to send or receive 
a message can suspend execution if they are unsuccessful at performing 
their operation. That is, a process which is attempting to send a message 
can wait until the process which is to receive the message is ready and vice 
versa. A process which specifies that execution is to be suspended is per- 
forming a "blocking message operation." A process which does not allow its 
execution to be suspended is performing a "nonblocking message opera- 
tion." 

A process performing a blocking message operation can be suspended 
until one of three conditions occurs: 

■ It is successful. 

■ It receives a signal. 

■ The facility is removed. 
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System calls make these message capabilities available to processes. The 
calling process passes arguments to a system call, and the system call either 
successfully or unsuccessfully performs its function. If the system call is 
successful, it performs its function and returns applicable information. Oth- 
erwise, a known error code (—1) is returned to the process, and an external 
error number variable errno is set accordingly. 

Before a message can be sent or received, a uniquely identified message 
queue and data structure must be created. The unique identifier created is 
called the message queue identifier (msqid); it is used to identify or refer- 
ence the associated message queue and data structure. 

The message queue is used to store (header) information about each 
message that is being sent or received. This information includes the fol- 
lowing for each message: 

■ pointer to the next message on queue 

■ message type 

■ message text size 

■ message text address 

There is one associated data structure for the uniquely identified mes- 
sage queue. This data structure contains the following information related 
to the message queue: 

■ operation permissions data (operation permission structure) 

■ pointer to first message on the queue 

■ pointer to last message on the queue 

■ current number of bytes on the queue 

■ number of messages on the queue 

■ maximum number of bytes on the queue 

■ process identification (PID) of last message sender 

■ PID of last message receiver 

■ last message send time 
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last message receive time 
last change time 



NOTE 



All include files discussed in this chapter are located in the /usr/ include 
or /usr/include/sys directories. 



The C Programming Language data structure definition for the message 
information contained in the message queue is as follows: 



struct TDsg 

{ 

struct msg 
long 

ShOGTt 

shc3frt 

}; 



*nisg_neict; A ptr to next message on q 

msg_type; /* message type ♦/ 

msg^ts; /♦ message teict size */ 

msg_spot; /« message text mp address */ 



It is located in the /usr/include/sys/msg.h header file. 

Likewise, the structure definition for the associated data structure is as 
follows: 
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struct msqid^ds 
{ 



struct ipc_perm 


msg perm; 


struct msg 


«rasg_first; 


struct msg 


*insg_last; 


ushort 


insg_ci]ytes; 


ushort 


iDsg cpimn; 


ushort 


msg_c^jytes ; 


usiiort 


msg Ispid; 


\isihort 


msg_lrpid; 


tiine_t 


msg^stime; 


tixne^t 


msg_rtiine; 


time t 


msg_ctiine; 







/* operation permission struct ♦/ 

/* ptr to first message on q */ 

/* ptr to last message on q */ 

/» current # bytes en q */ 

/* # of messages cn q ♦/ 

/* max # of bytes on q */ 

/♦ pid of last msgsnd ♦/ 

/* pid of last msgrcv */ 

/♦ last msgsnd time */ 

/* last msgrcv time */ 

/* last change time ♦/ 



It is located in the #iiiclude <sys/msg.h> header file also. Note that the 
insg__perm member of this structure uses ipc_perm as a template. The 
breakout for the operation permissions data structure is shown in Figure 
9-1. 



The definition of the ipc_perm data structure is as follows: 
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struct ipc_penn 



iishort 
nshort 
ushort 
ushort 
ushort 
ushort 
keyt 



uid; 

gid; 

cuid; 

ogid; 

iDode« 

seq; 

key; 



/♦ GMner's user id */ 

/♦ owner's group id */ 

/» creator's user id »/ 

/♦ creator's group id */ 

/« access modes */ 

/« slot usage seqpience nuntjer #/ 

/* key ♦/ 



); 




Figure 9-1: ipc_perm Data Structure 



It is located in the #include <sys/ipc.h> header file; it is common for all 
IPC facilities. 

The m8gget(2) system call is used to perform two tasks: 

■ to get a new msqid and create an associated message queue and data 
structure for it 

■ to return an existing msqid that already has an associated message 
queue and data structure 

Both tasks require a key argument passed to the msgget() system call. 
For the first task, if the key is not already in use for an existing msqid, a 
new msqid is returned with an associated message queue and data structure 
created for the key. This occurs provided no system tunable parameters 
would be exceeded and a control command IPC-CREAT is specified in the 
msgflg argument passed in the system call. 

There is also a provision for specifying a key of value zero which is 
known as the private key (IPC_PRIVATE = 0); when specified, a new msqid 
is always returned with an associated message queue and data structure 
created for it unless a system tunable parameter would be exceeded. When 
the ipcs command is performed, for security reasons the KEY field for the 
msqid is all zeros. 
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For the second task, if a msqid exists for the key specified, the value of 
the existing msqid is returned. If you do not desire to have an existing 
msqid returned, a control command (IPC__EXCL) can be specified (set) in the 
msgflg argument passed to the system call. The details of using this system 
call are discussed in the "Using msgget" section of this chapter. 

When performing the first task, the process which calls msgget becomes 
the owner/creator, and the associated data structure is initialized accord- 
ingly. Remember, ownership can be changed but the creating process 
always remains the creator; see the "Controlling Message Queues" section 
in this chapter. The creator of the message queue also determines the initial 
operation permissions for it. 

Once a uniquely identified message queue and data structure are 
created, message operations [msgopO] and message control [msgctl()] can be 
used. 

Message operations, as mentioned previously, consist of sending and 
receiving messages. System calls are provided for each of these operations; 
they are msgsnd() and msgrcv(). Refer to the "Operations for Messages" sec- 
tion in this chapter for details of these system calls. 

Message control is done by using the msgctl(2) system call. It permits 
you to control the message facility in the following ways: 

■ to determine the associated data structure status for a message queue 
identifier (msqid) 

■ to change operation permissions for a message queue 

■ to change the size (msgjqbytes) of the message queue for a particular 
msqid 

■ to remove a particular msqid from the UNIX operating system along 
with its associated message queue and data structure 

Refer to the "Controlling Message Queues" section in this chapter for 
details of the msgctl() system call. 
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Getting Message Queues 

This section gives a detailed description of using the msgget(2) system 
call along with an example program illustrating its use. 

Using msgget 

The synopsis found in the msgget(2) entry in the Programmer's Reference 
Manual is as follows: 



#include <sys/types.h> 
#include <sys/ipc . h> 
#include <sys/insg.h> 

ijit msgget (key, msgflg) 
lcey_t Icey; 
int msgflg; 

v: 



All of these include files are located in the /usr/ include /sys directory 
of the UNIX operating system. 

The following line in the synopsis: 

int msgget (key, msgflg) 

informs you that msgget() is a function with two formal arguments that 
returns an integer type value, upon successful completion (msqid). The 
next two lines: 

key_t key; 
int msgflg; 

declare the types of the formal arguments, key t is declared by a typedef 
in the types.h header file to be an integer. 
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The integer returned from this function upon successful completion is 
the message queue identifier (msqid) that was discussed earlier. 

As declared, the process calling the msgget() system call must supply 
two arguments to be passed to the formal key and msgflg arguments. 

A new msqid with an associated message queue and data structure is 
provided if either 

■ key is equal to IPC JPRIVATE, 

or 

■ key is passed a unique hexadecimal integer, and a control command 
IPC_CREAT is specified in the msgflg argument. 

The value passed to the msgflg argument must be an integer type value 
and it will specify the following: 

■ access permissions 

■ execution modes 

■ control fields (commands) 

Access permissions determine the read /write attributes and execution 
modes determine the user/ group /other attributes of the msgflg argument. 
They are collectively referred to as "operation permissions." Figure 9-2 
reflects the numeric values (expressed in octal notation) for the valid opera- 
tion permissions codes. 



Operation Permissions 


Octal Value 


Read by User 


00400 


Write by User 


00200 


Read by Group 


00040 


Write by Group 


00020 


Read by Others 


00004 


Write by Others 


00002 



Figure 9-2: Operation Permissions Codes 



A specific octal value is derived by adding the octal values for the operation 
permissions desired. That is, if read by user and read/ write by others is 
desired, the code value would be 00406 (00400 plus 00006). There are 
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constants located in the msg.h header file which can be used for the user 
operations permissions. 

Control commands are predefined constants (represented by all upper- 
case letters). Figure 9-3 contains the names of the constants which apply to 
the msggetO system call along with their values. They are also referred to 
as flags and are defined in the ipc,h header file. 



Control Command 


Value 


IPC CREAT 


0001000 


IPC_EXCL 


0002000 



Figure 9-3: Control Commands (Flags) 



The value for msgflg is therefore a combination of operation permis- 
sions and control commands. After determining the value for the operation 
permissions as previously described, the desired flag(s) can be specified. 
This is accomplished by bitwise ORing (| ) them with the operation permis- 
sions; the bit positions and values for the control commands in relation to 
those of the operation permissions make this possible. It is illustrated as 
follows: 





Octal Value 


Binary Value 


IPC_CREAT 




1000 


000 001 000 000 000 


i ORed by User 




00400 


000 000 100 000 000 


msgflg 




1400 


000 001 100 000 000 



The msgflg value can be easily set by using the names of the flags in 
conjunction with the octal operation permissions value: 

insqid = msgget (key, (HC^CREAT I 0400)); 

msqid = msgget (key, (IPC^CREAT I IPC_EXiCL ! 0400)); 
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As specified by the msgget(2) page in the Programmer's Reference Manual, 
success or failure of this system call depends upon the argument values for 
key and msgflg or system tunable parameters. The system call will attempt 
to return a new msqid if one of the following conditions is true: 

■ Key is equal to IPC_PRIVATE (0) 

■ Key does not already have a msqid associated with it, and (msgflg & 
IPC_CREAT) is "true" (not zero). 

The key argument can be set to IPC_PRIVATE in the following ways: 
msqid = msgget (IPC^EEtEVATE, msgflg); 

or 

msqid = msgget ( , msgflg) ; 

This alone will cause the system call to be attempted because it satisfies the 
first condition specified. Exceeding the MSGMNI system tunable parameter 
always causes a failure. The MSGMNI system tunable parameter determines 
the maximum number of unique message queues (msqid's) in the UNIX 
operating system. 

The second condition is satisfied if the value for key is not already asso- 
ciated with a msqid and IPC_CREAT is specified in the msgflg argument. 
This means that the key is unique (not in use) within the UNIX operating 
system for this facility type and that the IPC_CREAT flag is set (msgflg | 
IPC_CREAT). 

IPCJEXCL is another control command used in conjunction with 
IPC_CREAT to exclusively have the system call fail if, and only if, a msqid 
exists for the specified key provided. This is necessary to prevent the pro- 
cess from thinking that it has received a new (unique) msqid when it has 
not. In other words, when both IPC_CREAT and IPC_EXCL are specified, a 
new msqid is returned if the system call is successful. 

Refer to the msgget(2) page in the Programmer's Reference Manual for 
specific associated data structure initialization for successful completion. 
The specific failure conditions with error names are contained there also. 

Example Program 

The example program in this section (Figure 9-4) is a menu driven pro- 
gram which allows all possible combinations of using the msgget(2) system 
call to be exercised. 
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From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (Unes 4-8) by including the required header fites as 
specified by the msgget(2) entry in the Programmer's Reference Manual Note 
that the errno.h header file is included as opposed to declafing-errno-as an 
external variable; either method will work. 

Variable names have been chosen to be as close as possible to those in 
the synopsis for the system call. Their declarations are self-explanatory. 
These names make the program more readable, and it is perfectly legal since 
they are local to the program. The variables declared for this program and 
their purposes are as follows: 

■ key— used to pass the value for the desired key 

■ opperm— used to store the desired operation permissions 

■ flags — used to store the desired control commands (flags) 

■ opperm_flags — used to store the combination from the logical ORing 
of the opperm and flags variables; it is then used in the system call 
to pass the msgflg argument 

■ msqid— used for returning the message queue identification number 
for a successful system call or the error code (-1) for an unsuccessful 
one. 

The program begins by prompting for a hexadecimal key, an octal 
operation permissions code, and finally for the control command combina- 
tions (flags) which are selected from a menu (lines 15-32). All possible com- 
binations are allowed even though they might not be viable. This allows 
observing the errors for illegal combinations. 

Next, the menu selection for the flags is combined with the operation 
permissions, and the result is stored at the address of the opperm_flags vari- 
able (lines 36-51). 

The system call is made next, and the result is stored at the address of 
the msqid variable (line 53). 
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Since the msqid variable now contains a valid message queue identifier 
or the error code (—1)/ it is tested to see if an error occurred (line 55). If 
msqid equals —1/ a message indicates that an error resulted, and the exter- 
nal errno variable is displayed (lines 57, 58). 

If no error occurred, the returned message queue identifier is displayed 
(line 62). 

The example program for the msgget(2) system call follows. It is sug- 
gested that the source program file be named msgget.c and that the execut- 
able file be named msgget. 



1 /♦Uiis is a program tx> illustrate 

2 ««the iDessage get, nisggetC ) , 

3 «»system call capabilities.*/ 

4 #iiiclude <stdio.h> 

5 #include <sys/t^npes.h> 

6 #include <sys/ipc.h> 

7 #include <sys/insg.h> 

8 #iiicluae <erniD.li> 

9 /*Start of main C laiiguage program*/ 

10 maiji( ) 

11 { 

12 key t key; /*declare as long integer*/ 

13 int oppezm, flags; 

14 int msqid, qppexm_flags; 

15 /»Enter the desired key*/ 

16 printf{ "Enter the desired key in hex = "); 

17 scanf("96)c", &toey); 



18 /*Enter the desired octal qperaticn 

19 permissions.*/ 

20 printf ( "NnEiiter the operaticnNn" ) ; 

21 printf ("permissions in octal = "); 

22 scanf("9fo", Sopperm); 

23 /*Set the desired flags.*/ 

24 printf ("NnEiiter corresponding number to\n"); 

25 printf ("set the desired flags:\n" ) ; 

26 printf ("No flags = 0\n"); 

27 printf ("HC CREffiP =1\n"); 
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28 prijitf("IPC_EXCL a 2\n"); 

29 printfC'IPC^CREAT and nC_EXCL = 3\n"); 

30 pariQtf(" " Blags" = "); 

31 /«Get the flag(s) to be set,*/ 

32 scanf("?fid", &flags); 

33 /^nChedk. the values.*/ 

34 printf ("Nnkey =Ox56c, DRperra = 09fo, flags = 09fo\n", 

35 key, cppeim, flags); 

36 /«Inoorporate the oo n trol fields (flags) with 

37 the operation permissions*/ 

38 switch (flags) 

39 { 

40 case 0: /«No flags are to be set.*/ 

41 opperm flags = (cpperm I 0); 

42 break; 

43 case 1: /«Set the IPC GREAT flag.*/ 

44 qppezm_flags ~ (qpperm I IFC_CBEAT); 

45 break; 

46 case 2: /♦Set the IPC_E»CL flag.*/ 

47 opperm_flags = (pppem I IPC^EXCL) ; 

48 break; 

49 case 3: /*Set the ITC^CREAT axrl IPC EXCL flags.*/ 

50 ppperro flags = (cppeim I IPC CFEAT I IPC EXCL) ; 

51 } ' 

52 /*Gall the msgget sfystem call.*/ 

53 msqid » msgget (key, oppenn^flags); 

54 /*Perform the following if the call is unsuccessful.*/ 

55 if (rasqid = -1) 

56 { 

57 prantf ("NnflChe msgget systan call failed I Nji" ) ; 

58 printf ("0^ error imntoer = 96i\n", ermo); 

59 } 
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60 ARetum the nisqid upm successful coopleticn.*/ 

61 else 

62 printf ("Nnfme msqid = %ti\n", msqid); 

63 exit(O); 

64 } 




Figure 9-4: msgget() System Call Example 



Controlling Message Queues 

This section gives a detailed description of using the msgctl system call 
along with an example program which allows all of its capabilities to be 
exercised. 

Using msgctl 

The synopsis found in the msgctl(2) entry in the Programmer's Reference 
Manual is as follows: 



#±nclude <£fys/types.h> 
#include <sys/ipc.h> 
#include <sys/msg.h> 

int msgctl (msqid, and, buf ) 
int msqid, end; 
struct msqid^ds *bu£; 
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The msgctlO system call requires three arguments to be passed to it, and it 
returns an integer value. 

Upon successful completion, a zero value is returned; and when unsuc- 
cessful, it returns a —1. 

The msqid variable must be a valid, non-negative, integer value. In 
other words, it must have already been created by using the msgget() sys- 
tem call. 

The cmd argument can be replaced by one of the following control 
commands (flags): 

IPC_STAT return the status information contained in the associated 
data structure for the specified msqid, and place it in the 
data structure pointed to by the *buf pointer in the user 
memory area. 

IPC_SET for the specified msqid, set the effective user and group 
identification, operation permissions, and the number of 
bytes for the message queue. 

IPC_RMID remove the specified msqid along with its associated mes- 
sage queue and data structure. 

A process must have an effective user identification of 
OWNER /CREATOR or super-user to perform an IPC_SET or IPC_RMID con- 
trol command. Read permission is required to perform the IPC_STAT con- 
trol command. 

The details of this system call are discussed in the example program for 
it. If you have problems understanding the logic manipulations in this pro- 
gram, read the msgget(2) section of the "UNIX Programming Manual"; it 
goes into more detail than what would be practical for this document. 

Example Program 

The example program in this section (Figure 9-5) is a menu driven pro- 
gram which allows all possible combinations of using the msgctl(2) system 
call to be exercised. 

From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 
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This program begins (lines 5-9) by including the required header files as 
specified by the msgctl(2) entry in the Programmer's Reference ManuaL Note 
in this program that errno is declared as an external variable, and therefore, 
the errno.h header file does not have to be included. 

Variable and structure names have been chosen to be as close as possible 
to those in the synopsis for the system call. Their declarations are self- 
explanatory. These names make the program more readable, and it is per- 
fectly legal since they are local to the program. The variables declared for 
this program and their purpose are as follows: 

uid used to store the IPC SET value for the effective user 

identification 

gid used to store the IPCjSET value for the effective group 

identification 

mode used to store the IPC_SET value for the operation permis- 

sions 

bytes used to store the IPC_SET value for the number of bytes in 

the message queue (msgjqbytes) 

rtrn used to store the return integer value from the system call 

msqid used to store and pass the message queue identifier to the 
system call 

command used to store the code for the desired control command so 
that subsequent processing can be performed on it 

choice used to determine which member is to be changed for the 
IPC_SET control command 

msqid_ds used to receive the specified message queue indentifier's 

data structure when an IPC_STAT control command is per- 
formed 

*buf a pointer passed to the system call which locates the data 

structure in the user memory area where the IPC_STAT con- 
trol command is to place its return values or where the 
IPC_SET command gets the values to set 
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Note that the msqidjds data structure in this program (line 16) uses the 
data structure located in the msg.h header file of the same name as a tem- 
plate for its declaration. This is a perfect example of the advantage of local 
variables. 

The next important thing to observe is that although the ♦buf pointer is 
declared to be a pointer to a data structure of the msqid_ds type, it must 
also be initialized to contain the address of the user memory area data struc- 
ture (line 17). Now that all of the required declarations have been 
explained for this program, this is how it works. 

First, the program prompts for a valid message queue identifier which is 
stored at the address of the msqid variable (lines 19, 20). This is required 
for every msgcti system call. 

Then the code for the desired control command must be entered (lines 
21-27), and it is stored at the address of the command variable. The code is 
tested to determine the control command for subsequent processing. 

If the IPCJSTAT control command is selected (code 1), the system call is 
performed (lines 37, 38) and the status information returned is printed out 
(lines 39-46); only the members that can be set are printed out in this pro- 
gram. Note that if the system call is unsuccessful (line 106), the status 
information of the last successful call is printed out. In addition, an error 
message is displayed and the errno variable is printed out (lines 108, 109). 
If the system call is successful, a message indicates this along with the mes- 
sage queue identifier used (lines 111-114). 

If the IPCJSET control command is selected (code 2), the first thing 
done is to get the current status information for the message queue 
identifier specified (lines 50-52). This is necessary because this example pro- 
gram provides for changing only one member at a time, and the system call 
changes all of them. Also, if an invalid value happened to be stored in the 
user memory area for one of these members, it would cause repetitive 
failures for this control command until corrected. The next thing the pro- 
gram does is to prompt for a code corresponding to the member to be 
changed (lines 53-59). This code is stored at the address of the choice vari- 
able (line 60). Now, depending upon the member picked, the program 
prompts for the new value (lines 66-95). The value is placed at the address 
of the appropriate member in the user memory area data structure, and the 
system call is made (lines 96-98), Depending upon success or failure, the 
program returns the same messages as for IPCJSTAT above. 
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If the IPC_RMID control command (code 3) is selected, the system call is 
performed (lines 100-103), and the msqid along with its associated message 
queue and data structure are removed from the UNIX operating system. 
Note that the *buf pointer is not required as an argument to perform this 
control command, and its value can be zero or NULL. Depending upon the 
success or failure, the program returns the same messages as for the other 
control commands. 

The example program for the msgctl() system call follows. It is sug- 
gested that the source program file be named msgctl.c and that the execut- 
able file be named msgctl. 




1 /♦Ohis is a program to illustxate 

2 **thfi message control, msgctK ) , 

3 »*£fystem call capabilities. 

4 ♦/ 



5 /«Include necessary header files.*/ 

6 #lnclvde <stdio.h> 

7 ^include <sys/types.h> 

8 #diicluae <sys/ipc.h> 

9 iftmclude <sys/Bisg.h> 

10 AStart of mdn C language program*/ 

11 mainO 

12 { 

13 extern int ermo; 

14 ant uid, gid, mode, bytes; 

15 int rtm, msqid, ocmnand, choice; 

16 struct msqid_ds msqid ds, «buf ; 

17 bof = Smsqidds; 

18 /«Get the msqid, and ocnnand.*/ 

19 printf ("Enter the msqid = "); 

20 scanf("%d", Smsqid); 

21 printf ( "NnEtater the nmriier for\n" ) ; 

22 printf ( "the desired catinand:\n" ) ; 

23 printf ("IPC_STAT = INn"); 

24 printf ("IPC'SETP = 2Nn"); 

25 printf ("IPc'bMID = 3\n"); 

26 printf("Entry = 
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28 /ACheck the values.*/ 

29 printf ("Nnrosqid =%i, ocninard = %d\n'\ 

30 msqid, ocninand); 

31 svdtch (ocmrand) 

32 { 

33 case 1: AUse msgctlO to diqjlicate 

34 the data structure for 

35 msqid in the msqid^ds area pointed 

36 to Isy buf and then print it out.*/ 

37 rtm = msgctldnsqid, HC Smr, 

38 buf); 

39 printf ("Niflhe USER ID = 96d\n*\ 

40 buf->m9g_perm,uid) ; 

41 printf ( "TSie'caDOP ID = 5fid\n", 

42 buf->rosg_petra.gid) ; 

43 printf (*'13ie cperation permissions = 0%D\n"» 

44 buf->insg_perm.ncde); 

45 printf ( "The"msg_^Qytes = 3fid\n", 

46 buf->insg_qbytes) ; 

47 break; 

48 case 2: /*Select and change the desired 

49 ineniber(s) of the data structure.*/ 

50 /*Get the original data for this msqid 

51 data structure first.*/ 

52 rtm = msgctl (msqid, IPC STAT, buf); 

53 printf ( "XnEiiter the nmnber for theSn" ) ; 

54 printf ( "meoftser to be ctemgediNn"); 

55 printf ("msgperm.uid = 1\n"); 

56 printf ("msg_perm.gid = 2\n"); 

57 printf ("msgperm.mode = 3\n"); 

58 printf ("msg_(^j/tes = 4\n"); 

59 printf ("Entry = "); 

60 scanf("96d", SchDice); 

61 /#Only one choice is allowed per 

62 pass as an illegal entry vdll 

63 cause repetitive failures \mtil 

64 msqid^ds is updated vri.th 

65 IPC^SEAT.*/ 

66 svdtch( choice ){ 

67 case 1: 
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68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 

96 
97 
98 
99 



printf("\nEnter USER ID = 
scanf &uid); 
buf— >msg_perm.uid = uid; 
printfC\nUSER ID = %d\n", 

buf- > msg_perm.uid); 
break; 
case 2: 

printf(^nEnter GROUP ID = "); 
scanfC'%d", &gid); 
buf— >msg_perm.gid = gid; 
printf(AnGROUP ID = %d\n", 

buf-> msg_perm.gid); 
breaks- 
case 3: 

printf(VEnter MODE = 
scanf("%o", &mode); 
buf— >msg_perm. mode = mode; 
printf(AnMODE = 0%o\n", 

buf— > msg_perm.mode); 
break; 
case 4: 

printf(AnEnter msqjbytes = "); 
scanf("%d", &bytes); 
buf— >msg_qbytes = bytes; 
printf(Anmsg_qbytes = %d\n*', 

buf— > msg_qbytes); 
break; 



/*Do the change.*/ 

rtrn - msgctl(msqid, IPC_SET, 

buf); 
break; 

case 3: /*Remove the msqid along with its 
associated message queue 
and data structure// 
rtrn = msgctl(msqid, IPC_RMID, NULL); 

1 

/•Perform the following if the call is unsuccessful.*/ 

if(rtrn — -1) 

I 

printf (AnThe msgctl system call failed!\n"); 
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109 
110 
111 
112 
113 
114 
115 
U6 



msqid); 
exit (0); 



/•Return the msqid upon successful completion.*/ 
else 



printf ("\nMsgctl was successful for msqid = %d\n". 



printf (The error number = %d\n", errno); 





Figure 9-5: msgctl() System Call Example 



Operations for Messages 

This section gives a detailed description of using the msgsnd(2) and 
msgrcv(2) system calls, along with an example program which allows all of 
their capabilities to be exercised. 

Using msgop 

The synopsis found in the msgop(2) entry in the Programmer's Reference 
Manual is as follows: 



9-22 PROGRAMMER'S GUIDE 



Messages 



#include <sys/types.h> 
#iiiclude <sys/ipc.h> 
#incl\3de <sys/insg.h> 



int msgsnd (rnsqid, msgp, msgsz, msgflg) 
int msqid; 

struct msgbuf *ERsgp; 
iiit rosgsz, msgflg; 



int msgrcv (neqid, msgp, msgsz, msgtyp, msgflg) 
int mscfLd; 

struct ins^3uf -wnsgp; 
int msgsz; 
long msgtyp; 



int msgflg; 




Sending a Message 

The msgsnd system call requires four arguments to be passed to it. It 
returns an integer value. 

Upon successful completion, a zero value is returned; and when unsuc- 
cessful, msgsndO returns a —1. 

The msqid argument must be a valid, non-negative, integer value. In 
other words, it must have already been created by using the msgget() sys- 
tem call. 

The msgp argument is a pointer to a structure in the user memory area 
that contains the type of the message and the message to be sent. 

The msgsz argument specifies the length of the character array in the 
data structure pointed to by the msgp argument. This is the length of the 
message. The maximum size of this array is determined by the MSGMAX 
system tunable parameter. 
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The msgflg argument allows the "blocking message operation" to be per- 
formed if the IPC__NOWAIT flag is not set ((msgflg & IPC_NOWAIT)= = 0); 
this would occur if the total number of bytes allowed on the specified mes- 
sage queue are in use (msg^qbytes or MSGMNB), or the total system-wide 
number of messages on all queues is equal to the system imposed limit 
(MSGTQL). If the IPC_NOWAIT flag is set, the system call will fail and 
return a —1. 

The msg_qbytes data structure member can be lowered from MSGMNB 
by using the msgctl() IPC_SET control command, but only the super-user 
can raise it afterwards. 

Further details of this system call are discussed in the example program 
for it. If you have problems understanding the logic manipulations in this 
program, read the "Using msgget" section of this chapter; it goes into more 
detail than what would be practical to do for every system call. 
Receiving Messages 

The msgrcvO system call requires five arguments to be passed to it, and 
it returns an integer value. 

Upon successful completion, a value equal to the number of bytes 
received is returned and when unsuccessful it returns a -1. 

The msqid argument must be a valid, non-negative, integer value. In 
other words, it must have already been created by using the msgget() sys- 
tem call. 

The msgp argument is a pointer to a structure in the user memory area 
that will receive the message type and the message text. 

The msgsz argument specifies the length of the message to be received. 
If its value is less than the message in the array, an error can be returned if 
desired; see the msgflg argument. 

The msgtyp argument is used to pick the first message on the message 
queue of the particular type specified. If it is equal to zero, the first mes- 
sage on the queue is received; if it is greater than zero, the first message of 
the same type is received; if it is less than zero, the lowest type that is less 
than or equal to its absolute value is received. 

The msgflg argument allows the '"blocking message operation" to be per- 
formed if the IPC_NOWAIT flag is not set (msgflg & IPC_NOWAIT = 0); 
this would occur if there is not a message on the message queue of the 
desired type (msgtyp) to be received. If the IPC_NOWAIT flag is set, the 
system call will fail immediately when there is not a message of the desired 
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type on the queue. Msgflg can also specify that the system call fail if the 
message is longer than the size to be received; this is done by not setting 
the MSG_NOERROR flag in the msgflg argument (msgflg & 
MSG_NOERROR = 0). If the MSG_NOERROR flag is set, the message is 
truncated to the length specified by the msgsz argument of msgrcv(). 

Further details of this system call are discussed in the example program 
for it. If you have problems understanding the logic manipulations in this 
program, read the "Using msgget" section of this chapter; it goes into more 
detail than what would be practical to do for every system call. 

Example Program 

The example program in this section (Figure 9-6) is a menu driven pro- 
gram which allows all possible combinations of using the msgsnd() and 
msgrcv(2) system calls to be exercised. 

From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (lines 5-9) by including the required header files as 
specified by the msgop(2) entry in the Programmer's Reference Manual Note 
that in this program errno is declared as an external variable, and therefore, 
the errno.h header file does not have to be included. 

Variable and structure names have been chosen to be as close as possible 
to those in the synopsis. Their declarations are self-explanatory. These 
names make the program more readable, and this is perfectly legal since 
they are local to the program. The variables declared for this program and 
their purposes are as follows: 

sndbuf used as a buffer to contain a message to be sent (line 13); it 
uses the msgbuf 1 data structure as a template (lines 10-13). 
The msgbufl structure (lines 10-13) is a duplicate of the 
msgbuf structure contained in the msg.h header file, except 
that the size of the character array for msgbuf is tailored to 
fit this application. It contains the maximum message size 
(MSGMAX) for the computer. For this reason msgbuf can- 
not be used directly as a template for the user-written pro^ 
gram. It is there so you can determine its members. 
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rcvbuf used as a buffer to receive a message (line 13); it uses the 
msgbuf 1 data structure as a template (lines 10-13) 

*msgp used as a pointer (line 13) to both the sndbuf and rcvbuf 
buffers 

i used as a counter for inputting characters from the key- 

board, storing them in the array, and keeping track of the 
message length for the msgsnd() system call; it is also used 
as a counter to output the received message for the msgrcv() 
system call 

c used to receive the input character from the getchar() func- 

tion (line 50) 

flag used to store the code of IPC_NOWAIT for the msgsnd() 

system call (line 61) 

flags used to store the code of the IPC^NOWAIT or 

MSG_NOERROR flags for the msgrcv() system call (line 117) 

choice used to store the code for sending or receiving (line 30) 

rtrn used to store the return values from all system calls 

msqid used to store and pass the desired message queue identifier 
for both system calls 

msgsz used to store and pass the size of the message to be sent or 
received 

msgflg used to pass the value of flag for sending or the value of 
flags for receiving 

msgtyp used for specifying the message type for sending, or used to 
pick a message type for receiving. 

Note that a msqid_ds data structure is set up in the program (line 21) 
with a pointer which is initialized to point to it (line 22); this will allow the 
data structure members that are affected by message operations to be 
observed. They are observed by using the msgctl() (IPC_STAT) system call 
to get them for the program to print them out (lines 80-92 and lines 161- 
168). 
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The first thing the program prompts for is whether to send or receive a 
message. A corresponding code must be entered for the desired operation, 
and it is stored at the address of the choice variable (lines 23-30). Depend- 
ing upon the code, the program proceeds as in the following msgsnd or 
msgrcv sections. 

msgsnd 

When the code is to send a message, the msgp pointer is initialized (line 
33) to the address of the send data structure, sndbuf . Next, a message type 
must be entered for the message; it is stored at the address of the variable 
msgtyp (line 42), and then (line 43) it is put into the mtype member of the 
data structure pointed to by msgp. 

The program now prompts for a message to be entered from the key- 
board and enters a loop of getting and storing into the mtext array of the 
data structure (lines 48-51). This will continue until an end of file is recog- 
nized which for the getchar() function is a control-d (CTRL-D) immediately 
following a carriage return (<CR>). 

The message is immediately echoed from the mtext array of the sndbuf 
data structure to provide feedback (lines 54-56), 

The next and final thing that must be decided is whether to set the 
IPC_NOWAIT flag. The program does this by requesting that a code of a 1 
be entered for yes or anything else for no (lines 57-65). It is stored at the 
address of the flag variable. If a 1 is entered, IPC_NOWAIT is logically 
ORed with msgfig; otherwise, msgflg is set to zero. 

The msgsndO system call is performed (line 69). If it is unsuccessful, a 
failure message is displayed along with the error number (lines 70-72), If it 
is successful, the returned value is printed which should be zero (lines 73- 
76). 

Every time a message is successfully sent, there are three members of 
the associated data structure which are updated. They are described as fol- 
lows: 

msg_qnum represents the total number of messages on the message 
queue; it is incremented by one. 

msg lspid contains the Process Identification (PID) number of the last 
process sending a message; it is set accordingly. 
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msg_stime contains the time in seconds since January 1, 1970, 

Greenwicli Mean Time (GMT) of the last message sent; it is 
set accordingly. 

These members are displayed after every successful message send opera- 
tion (lines 79-92). 

msgrcv 

If the code specifies that a message is to be received, the program con- 
tinues execution as in the following paragraphs. 

The msgp pointer is initialized to the rcvbuf data structure (line 99). 

Next, the message queue identifier of the message queue from which to 
receive the message is requested, and it is stored at the address of msqid 
(lines 100-103). 

The message type is requested, and it is stored at the address of msgtyp 
(lines 104-107). 

The code for the desired combination of control flags is requested next, 
and it is stored at the address of flags (lines 108-117). Depending upon the 
selected combination, msgflg is set accordingly (lines 118-133). 

Finally, the number of bytes to be received is requested, and it is stored 
at the address of msgsz (lines 134-137). 

The msgrcvO system call is performed (line 144). If it is unsuccessful, a 
message and error number is displayed (lines 145-148). If successful, a mes- 
sage indicates so, and the number of bytes returned and the msg type 
returned (because the value returned may be different from the value 
requested) is displayed followed by the received message (lines 153-159). 

When a message is successfully received, there are three members of the 
associated data structure which are updated; they are described as follows: 

msg_qnum contains the number of messages on the message queue; it is 
decremented by one. 

msgjrpid contains the process identification (PID) of the last process 
receiving a message; it is set accordingly. 

msg_rtime contains the time in seconds since January 1, 1970, 

Greenwich Mean Time (GMT) that the last process received 
a message; it is set accordingly. 
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The example program for the msgopO system calls follows. It is sug- 
gested that the program be put into a source file called msgop.c and then 
into an executable file called msgop. 



1 /♦This is a program to illxastrate 

2 **the message operations, insgopO, 

3 ♦♦system call capabilities. 

4 ♦/ 

5 /♦Include necessary header files. ♦/ 

6 #include <stdio.h> 

7 #iiiclude <sys/types.h> 

8 #include <sys/ipc.h> 

9 #include <sys/msg.h> 

10 struct msgbufl { 

11 long mtype; 

12 char rate3ct[8192] ; 

13 > sndbuf , rcvtuf , ♦msgp; 

14 /♦Start of main C language program^/ 

15 main( ) 

16 { 

17 extern int ermo; 

18 int i, c, flag, flags, choice; 

19 int rtm, msqid, msgsz, msgflg; 

20 long mt^^pe, msgtyp; 

21 struct rosqid_ds msqid_^ds, ♦tuf ; 

22 buf = Smsqidds; 

23 /♦Select the desired operation,*/ 

24 prijitf ( "Enter the ococrespCEiidiJigNn" ) ; 

25 printfC'code to send ar\n"); 

26 printf ( "receive a message :\n" ) ; 

27 printf ("Send = 1\n"); 

28 printf ("Receive = 2\n"); 

29 printf ("Entry = "); 

30 scanf("56i", ^choice); 

31 if (choice == 1) /♦Send a message. ♦/ 

32 { 

33 msgp = S.sndbuf ; /♦Point to user send structure. ♦/ 

34 printf ("\nEnter the msqid of\n") ; 
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35 printf ( "the message queue to\n" ) ; 

36 printf ("handle the message = "); 

37 scanf("9a", &msqid); 

38 /♦Set the message type.*/ 

39 printf ("NnEnter a positive integerSn" ) ; 

40 printf ("message type (long) for the\n"); 

41 printf ("message = "); 

42 scanf("%d", toisgtyp); 

43 msgp->ratype = msgt^; 

44 /*Enter the message to send.*/ 

45 printf ("NnEnter a message: \n"); 

46 /*A oontcol-d (^d) terminates as 

47 BOP.*/ 

48 /*<3et each character of the message 

49 and put it in the mtext array.*/ 

50 fcr(i = 0; ((c = getcharO) 1= EOF); i++) 

51 sndhuf .nitextCi] = c; 

52 /*Detennine the message size.*/ 

53 msgsz = i + 1; 

54 /*Echo the message to send.*/ 

55 for(i = 0; i < msgsz; i++) 

56 putchar { sndbuf .mtext[ i ] ) ; 

57 /*Set the IPC_NOWAIT flag if 

58 desired.*/ 

59 printf ("NnEnter a 1 if you want theNn"); 

60 printf ( "the IPC_NCIWAIT flag set : " ) ; 

61 scanf("96a", &flag); 

62 if (flag = 1) 

63 rosgflg != IPC_NCIWAIT; 

64 else 

65 msgflg = 0; 

66 /*Clieck the msgflg.*/ 

67 printf ( "\nmsgflg « 05foNn", msgflg); 

68 /*Send the message.*/ 
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69 rtm = msgsnd(ntsqid, msgp* insgsz, msgflg); 

70 if{rtm==-1) 

71 printfC'NnMsgsnd failed. Error = 9&i\n'\ 

72 ermo) ; 

73 else { 

74 /♦Print the value of test vMch 

75 should be zero for successful,*/ 

76 p£rintf("\nValue returned = %d\n", rtm); 

77 /♦Print the size of the message 

78 sent,*/ 

79 printfC'NnMsgsz = ?6d\n", msgsz) ; 

80 /«Check the data structure update.*/ 

81 msgctl{msqid, IPC STAT, buf); 

82 /♦Print out the affected nenbers.*/ 

83 /*Print the incremented number of 

84 messages on the queue.*/ 

85 printf ("NrfHifi insg_cpimn = %d\n", 

86 buf->msg_qnum) ; 

87 /♦Print the process id of the last sender. ♦/ 

88 printfCUie msg_lspid = %d\n", 

89 buf->insg_lspid) ; 

90 /♦Print the last send time,^/ 

91 printfC'Ihe iiisg_stiine = %d\n'\ 

92 bu£->msg stiite) ; 

93 } 

94 } 

95 if (choice == 2) /*Receive a message.*/ 

96 { 

97 /*Initialize the message pointer 

98 to the receive buffer.*/ 

99 msgp = &rcvtuf ; 

100 /* Specify the message queue vdiich contains 

101 the desired message.*/ 

102 printf ("\nEnter the rosqid = "); 

103 scanf("36a". Smsqid); 

104 /*Specify the specific message on the queue 
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105 by using its type**/ 

106 printfCNnEnter the msgtyp = "); 

107 scanf("%d", Smsgtyp); 

108 /tConfigure the cxmtrol flags for the 

109 desired actiooris.*/ 

110 printf ("NnEiiter the oorrespoin±Lng cx)de\n"); 

111 printfC'to select the desired flags: \n"); 

112 printf( "No flags = 0\n"); 

113 printf ("MSG_NQERROR = INn"); 

114 prijitf("IPC~NaWAIT = 2\n"); 

1 15 printf ( "MSg"noerP0R and IPC_NCIWAIT = 3\n" ) ; 

116 printf (" Flags = "); 

117 scanf("%d", &flags); 

118 switch( flags) { 

119 /*Set msgflg by CRing it vdth the appropriate 

120 flags (ccoistants) 

121 case 0: 

122 msgflg = 0; 

123 break; 

124 case 1: 

125 msgflg •= MSG^NOERRCR; 

126 break; 

127 case 2: 

128 msgflg !« IPC NOMAIT; 

129 break; 

130 case 3: 

131 msgflg 1= M9G_NDERR0R ! IPC_NOWAIT; 

132 break; 

133 } 

134 ASpeci^ the nmriber of bytes to receive.*/ 

135 printf ("NnEnter the number of bytesNn"); 

136 printf {"to receive (nsgsz) = "); 

137 scanf("%d", Smsgsz); 

138 /«Check the values for the arguments.*/ 

139 printf ( "Nransqid =963\n" , msqid) ; 

140 printf ("\megtyp = SfidSn", msgtyp); 

141 printf ("\nmsgsz = %d\n", msgsz); 

142 printf ("Njirosgflg = 0%o\n", msgflg); 
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143 /*Call msgrcv to receive the message,*/ 

144 rtm = msgrcvdnsqid, msgp, msgsz, msgt^, insgflg); 

145 if(rtm~-1) { 

146 printf{"NjiMsgrcv failed. "); 

147 printf{ "Error = %a\n", ermo); 

148 } 

149 else ( 

150 printf ("\rMsgctl was successfxilNn" ) ; 

151 printf ("far msqid = %d\n", 

152 msqid) ; 

153 /♦Print the nuicber of bytes received, 

154 it is equal to the return 

155 value.*/ 

156 printf ("Bytes received = %d\n", rtm); 

157 /♦Print the received message.*/ 

158 for(i = 0; i<=rtm; i++) 

159 patchar(rc^*uf .mtextCi]); 

160 } 

161 /«Check the associated data structure. ♦/ 

162 msgctl (msqid, IPC_STAr, buf); 

163 /♦Print the decremented number of messages.*/ 

164 printf ("NnCEhe msg_qnum = %d\n", buf->rosg^cpimn) 

165 /*Print the process id of the last receiver.*/ 

166 printf ("Hie msg^lrpid = 96d\n", buj^msglrpid) 

167 /*Print the last message receive time*/ 

168 printf ("The msg_rtiine = %iNn", buf^msg_rtirae) 

169 } 

170 } 



Figure 9-6: msgopO System Call Example 
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The semaphore type of IPC allows processes to communicate through 
the exchange of semaphore values. Since many applications require the use 
of more than one semaphore, the UNIX operating system has the ability to 
create sets or arrays of semaphores. A semaphore set can contain one or 
more semaphores up to a limit set by the system administrator. The tunable 
parameter, SEMMSL has a default value of 25. Semaphore sets are created 
by using the semget(2) system call. 

The process performing the semget(2) system call becomes the 
owner /creator, determines how many semaphores are in the set, and sets 
the operation permissions for the set, including itself. This process can sub- 
sequently relinquish ownership of the set or change the operation permis- 
sions using the semctl(), semaphore control, system call. The creating pro- 
cess always remains the creator as long as the facility exists. Other 
processes with permission can use semctl() to perform other control func- 
tions. 

Any process can manipulate the semaphore(s) if the owner of the sema- 
phore grants permission. Each semaphore within a set can be manipulated 
in two ways with the semop(2) system call (which is documented in the 
Programmer's Reference Manual): 

■ incremented 

■ decremented 

To increment a semaphore, an integer value of the desired magnitude is 
passed to the seinop(2) system call. To decrement a semaphore, a minus (-) 
value of the desired magnitude is passed. 

The UNIX operating system ensures that only one process can manipu- 
late a semaphore set at any given time. Simultaneous requests are per- 
formed sequentially in an arbitrary manner. 

A process can test for a semaphore value to be greater than a certain 
value by attempting to decrement the semaphore by one more than that 
value. If the process is successful, then the semaphore value is greater than 
that certain value. Otherwise, the semaphore value is not. While doing 
this, the process can have its execution suspended (IPC NOWAIT flag not 
set) until the semaphore value would permit the operation (other processes 
increment the semaphore), or the semaphore facility is removed. 
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The ability to suspend execution is called a "blocking semaphore opera- 
tion." This ability is also available for a process which is testing for a sema- 
phore to become zero or equal to zero; only read permission is required for 
this test, and it is accomplished by passing a value of zero to the semop(2) 
system call. 

On the other hand, if the process is not successful and the process does 
not request to have its execution suspended, it is called a "nonblocking 
semaphore operation." In this case, the process is returned a known error 
code (—1), and the external errno variable is set accordingly. 

The blocking semaphore operation allows processes to communicate 
based on the values of semaphores at different points in time. Remember 
also that IPC facilities remain in the UNIX operating system until removed 
by a permitted process or until the system is reinitialized. 

Operating on a semaphore set is done by using the semop(2), sema- 
phore operation, system call. 

When a set of semaphores is created, the first semaphore in the set is 
semaphore number zero. The last semaphore number in the set is one less 
than the total in the set. 

An array of these "blocking /nonblocking operations" can be performed 
on a set containing more than one semaphore. When performing an array 
of operations, the "blocking /nonblocking operations" can be applied to any 
or all of the semaphores in the set. Also, the operations can be applied in 
any order of semaphore number. However, no operations are done until 
they can all be done successfully. This requirement means that preceding 
changes made to semaphore values in the set must be undone when a 
"blocking semaphore operation" on a semaphore in the set cannot be com- 
pleted successfully; no changes are made until they can all be made. For 
example, if a process has successfully completed three of six operations on a 
set of ten semaphores but is "blocked" from performing the fourth opera- 
tion, no changes are made to the set until the fourth and remaining opera- 
tions are successfully performed. Additionally, any operation preceding or 
succeeding the "blocked" operation, including the blocked operation, can 
specify that at such time that all operations can be performed successfully, 
that the operation be undone. Otherwise, the operations are performed and 
the semaphores are changed or one "nonblocking operation" is unsuccessful 
and none are changed. All of this is commonly referred to as being "atomi- 
cally performed." 
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The ability to undo operations requires the UNIX operating system to 
maintain an array of "undo structures" corresponding to the array of sema- 
phore operations to be performed. Each semaphore operation which is to 
be undone has an associated adjust variable used for undoing the operation, 
if necessary. 

Remember, any unsuccessful "nonblocking operation" for a single sema- 
phore or a set of semaphores causes immediate return with no operations 
performed at all. When this occurs, a known error code (—1) is returned to 
the process, and the external variable errno is set accordingly. 

System calls make these semaphore capabilities available to 
processes.The calling process passes arguments to a system call, and the sys- 
tem call either successfully or unsuccessfully performs its function. If the 
system call is successful, it performs its function and returns the appropriate 
information. Otherwise, a known error code (-1) is returned to the process, 
and the external variable errno is set accordingly. 



Using Semaphores 

Before semaphores can be used (operated on or controlled) a uniquely 
identified data structure and semaphore set (array) must be created. The 
unique identifier is called the semaphore identifier (semid); it is used to 
identify or reference a particular data structure and semaphore set. 

The semaphore set contains a predefined number of structures in an 
array, one structure for each semaphore in the set. The number of sema- 
phores (nsems) in a semaphore set is user selectable. The following 
members are in each structure within a semaphore set: 

■ semaphore text map address 

■ process identification (PID) performing last operation 

■ number of processes awaiting the semaphore value to become greater 
than its current value 

■ number of processes awaiting the semaphore value to equal zero 

There is one associated data structure for the uniquely identified sema- 
phore set. This data structure contains information related to the sema- 
phore set as follows: 
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■ operation permissions data (operation permissions structure) 

■ pointer to first semaphore in the set (array) 

■ number of semaphores in the set 

■ last semaphore operation time 

■ last semaphore change time 

The C Programming Language data structure definition for the sema- 
phore set (array member) is as follows: 



struct sem 
{ 



}; 



ushoort senrval; 

short senpid; 

nsbort senncnt; 

ushort SGmzcntj 



/« seroaphore text map address */ 
/♦ pid of last qperation ♦/ 
/♦ # awaiting semval > cval ♦/ 
/♦ # awadting senivral = ♦/ 



It is located in the #include <sys/sem.h> header file. 

Likewise, the structure definition for the associated semaphore data 
structure is as follows: 
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struct semid ds 





/* operatlcn permissicn struct */ 
A ptr to first seraapiiore in set */ 
/* # of senaphores in set */ 
A last senop time */ 
A last change time */ 





It is also located in the #include <sys/sem.h> header file. Note that 
the sem_perm member of this structure uses ipc_perm as a template. The 
breakout for the operation permissions data structure is shown in Figure 



The ipc_perm data structure is the same for all IPC facilities, and it is 
located in the #include <sys/ipc.h> header file. It is shown in the "Mes- 
sages" section. 

The semget(2) system call is used to perform two tasks: 

■ to get a new semid and create an associated data structure and sema- 
phore set for it 

■ to return an existing semid that already has an associated data struc- 
ture and semaphore set 

The task performed is determined by the value of the key argument passed 
to the semget(2) system call. For the first task, if the key is not already in 
use for an existing semid and the IPC_CREAT flag is set, a new semid is 
returned with an associated data structure and semaphore set created for it 
provided no system tunable parameter would be exceeded. 

There is also a provision for specifying a key of value zero (0) which is 
known as the private key (IPC_PRIVATE = 0); when specified, a new semid 
is always returned with an associated data structure and semaphore set 
created for it unless a system tunable parameter would be exceeded. When 
the ipcs command is performed, the KEY field for the semid is all zeros. 
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When performing the first task, the process which calls semget() 
becomes the owner/ creator, and the associated data structure is initialized 
accordingly. Remember, ownership can be changed, but the creating pro- 
cess always remains the creator; see the "Controlling Semaphores" section 
in this chapter. The creator of the semaphore set also determines the initial 
operation permissions for the facility. 

For the second task, if a semid exists for the key specified, the value of 
the existing semid is returned. If it is not desired to have an existing semid 
returned, a control command (IPC^EXCL) can be specified (set) in the 
semflg argument passed to the system call. The system call will fail if it is 
passed a value for the number of semaphores (nsems) that is greater than 
the number actually in the set; if you do not know how many semaphores 
are in the set, use for nsems. The details of using this system call are dis- 
cussed in the "Using semget" section of this chapter. 

Once a uniquely identified semaphore set and data structure are created, 
semaphore operations [semop(2)] and semaphore control [semctl()] can be 
used. 

Semaphore operations consist of incrementing, decrementing, and test- 
ing for zero. A single system call is used to perform these operations. It is 
called semopO. Refer to the "Operations on Semaphores" section in this 
chapter for details of this system call. 

Semaphore control is done by using the semctl(2) system call. These 
control operations permit you to control the semaphore facility in the fol- 
lowing ways: 

■ to return the value of a semaphore 

■ to set the value of a semaphore 

■ to return the process identification (PID) of the last process perform- 
ing an operation on a semaphore set 

■ to return the number of processes waiting for a semaphore value to 
become greater than its current value 

■ to return the number of processes waiting for a semaphore value to 
equal zero 

■ to get all semaphore values in a set and place them in an array in 
user memory 
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■ to set all semaphore values in a semaphore set from an array of 
values in user memory 

■ to place all data structure member values, status, of a semaphore set 
into user memory area 

■ to change operation permissions for a semaphore set 

■ to remove a particular semid from the UNIX operating system along 
with its associated data structure and semaphore set 

Refer to the "Controlling Semaphores" section in this chapter for details 
of the semctl(2) system call. 



Getting Semaphores 

This section contains a detailed description of using the semget(2) sys- 
tem call along with an example program illustrating its use. 

Using semget 

The synopsis found in the semget(2) entry in the Programmer's Reference 
Manual is as follows: 



#include <sys/types . h> 
#iiicl\ide <sys/ipc . h> 
#include <sys/sem.h> 

int semget (key, nseros, seag) 
key_t key; 
int nsems, senig; 



The following line in the synopsis: 

int semget (key, nsenns, senflg) 

informs you that semget(} is a function with three formal arguments that 
returns an integer type value, upon successful completion (semid). The 
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next two lines: 

key^t key; 

int nsems, seinflg; 

declare the types of the formal arguments. key_t is declared by a typedef 
in the types.h header file to be an integer. 

The integer returned from this system call upon successful completion 
is the semaphore set identifier (semid) that was discussed above. 

As declared, the process calling the semget() system call must supply 
three actual arguments to be passed to the formal key, nsems, and semflg 
arguments. 

A new semid with an associated semaphore set and data structure is 
provided if either 

■ key is equal to IPC_PRIVATE, 

or 

■ key is passed a unique hexadecimal integer, and semflg awk with 
IPC_CREAT is TRUE. 

The value passed to the semflg argument must be an integer type octal 
value and will specify the following: 

El access permissions 

■ execution modes 

■ control fields (commands) 

Access permissions determine the read /alter attributes and execution 
modes determine the user /group /other attributes of the semflg argument. 
They are collectively referred to as "operation permissions." Figure 9-7 
reflects the numeric values (expressed in octal notation) for the valid opera- 
tion permissions codes. 
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Operation Permissions 



Octal Value 



Read by User 
Alter by User 
Read by Group 
Alter by Group 
Read by Others 
Alter by Others 



00400 
00200 
00040 
00020 
00004 
00002 



Figure 9-7: Operation Permissions Codes 



A specific octal value is derived by adding the octal values for the 
operation permissions desired. That is, if read by user and read/alter by 
others is desired, the code value would be 00406 (00400 plus 00006). There 
are constants #define'd in the sem.h header file which can be used for the 
user (OWNER). They are as follows: 

SEM_A 0200 /* alter pezmission by owner */ 
SEM_R 0400 /* read permissictn by owner */ 

Control commands are predefined constants (represented by all upper- 
case letters). Figure 9-8 contains the names of the constants which apply to 
the semget(2) system call along with their values. They are also referred to 
as flags and are defined in the ipch header file. 



Figure 9-8: Control Commands (Flags) 



The value for semflg is, therefore, a combination of operation permis- 
sions and control commands. After determining the value for the operation 
permissions as previously described, the desired flag(s) can be specified. 
This specification is accomplished by bitwise ORing (| ) them with the opera- 
tion permissions; the bit positions and values for the control commands in 
relation to those of the operation permissions make this possible. It is 
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IPC_CREAT 
IPC EXCL 



0001000 
0002000 
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illustrated as follows: 





Octal Value 


Binary Value 


IPC_CREAT 


= 


10 


000 001 000 000 000 


CW| ORed by User 




00400 


000 000 100 000 000 


semflg 




1400 


000 001 100 000 000 



The semflg value can be easily set by using the names of the flags in 
conjunction with the octal operation permissions value: 

SCTiid = semget (key, nsenis, (IPC_CEyEAT ! 0400)); 

semid = songet (key, nsems, (UC CREAT I IPC EXCL I 0400)); 

As specified by the semget(2) entry in the Programmer's Reference Manual, 
success or failure of this system call depends upon the actual argument 
values for key, nsems, semflg or system tunable parameters. The system 
call will attempt to return a new semid if one of the following conditions is 
true: 

H Key is equal to IPC_PRIVATE (0) 

H Key does not already have a semid associated with it, and (semflg & 
IPC_CREAT) is "true" (not zero). 

The key argument can be set to IPC_PRIVATE in the following ways: 
sonid = semget (IPC_ERIVArE, nseros, semflg); 

or 

seraid = seannget ( 0, nsetns, semflg); 

This alone will cause the system call to be attempted because it satisfies the 
first condition specified. 

Exceeding the SEMMNI, SEMMNS, or SEMMSL system tunable parame- 
ters will always cause a failure. The SEMMNI system tunable parameter 
determines the maximum number of unique semaphore sets (semid's) in the 
UNIX operating system. The SEMMNS system tunable parameter deter- 
mines the maximum number of semaphores in all semaphore sets system 
wide. The SEMMSL system tunable parameter determines the maximum 
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number of semaphores in each semaphore set. 

The second condition is satisfied if the value for key is not already asso- 
ciated with a semid, and the bitwise awk of semflg and IPC_CREAT is 
"true" (not zero). This means that the key is unique (not in use) within the 
UNIX operating system for this facility type and that the IPC_CREAT flag is 
set (semflg | IPC_CREAT), SEMMNI, SEMMNS, and SEMMSL apply here 
also, just as for condition one. 

IPC_EXCL is another control command used in conjunction with 
IPC^CREAT to exclusively have the system call fail if, and only if, a semid 
exists for the specified key provided. This is necessary to prevent the pro- 
cess from thinking that it has received a new (unique) semid when it has 
not. In other words, when both IPC_CREAT and IPC_EXCL are specified, a 
new semid is returned if the system call is successful. Any value for semflg 
returns a new semid if the key equals zero (IPC_PRIVATE) and no system 
tunable parameters are exceeded. 

Refer to the semget(2) manual page for specific associated data structure 
initialization for successful completion. 

Example Program 

The example program in this section (Figure 9-9) is a menu driven pro- 
gram which allows all possible combinations of using the semget(2) system 
call to be exercised. 

From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (lines 4-8) by including the required header files as 
specified by the semget(2) entry in the Programmer's Reference Manual. Note 
that the errno.h header file is included as opposed to declaring errno as an 
external variable; either method will work. 

Variable names have been chosen to be as close as possible to those in 
the synopsis. Their declarations are self-explanatory. These names make 
the program more readable, and this is perfectly legal since they are local to 
the program. The variables declared for this program and their purpose are 
as follows: 

■ key — used to pass the value for the desired key 
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■ opperm — used to store the desired operation permissions 

■ flags — used to store the desired control commands (flags) 

■ opperm^flags— used to store the combination from the logical ORing 
of the opperm and flags variables; it is then used in the system call 
to pass the semflg argument 

■ semid— used for returning the semaphore set identification number 
for a successful system call or the error code (-1) for an unsuccessful 
one. 

The program begins by prompting for a hexadecimal key, an octal 
operation permissions code, and the control command combinations (flags) 
which are selected from a menu (lines 15-32). All possible combinations are 
allowed even though they might not be viable. This allows observing the 
errors for illegal combinations. 

Next, the menu selection for the flags is combined with the operation 
permissions, and the result is stored at the address of the opperm flags vari- 
able (lines 36-52). 

Then, the number of semaphores for the set is requested (lines 53-57), 
and its value is stored at the address of nsems. 

The system call is made next, and the result is stored at the address of 
the semid variable (lines 60, 61). 

Since the semid variable now contains a valid semaphore set identifier 
or the error code (-1), it is tested to see if an error occurred (line 63). If 
semid equals —1, a message indicates that an error resulted and the external 
errno variable is displayed (lines 65, 66). Remember that the external errno 
variable is only set when a system call fails; it should only be tested 
immediately following system calls. 

If no error occurred, the returned semaphore set identifier is displayed 
(line 70). 

The example program for the semget(2) system call follows. It is sug- 
gested that the source program file be named semget.c and that the execut- 
able file be named semget. 
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( \ 

1 AThis is a program to illustrate 

2 ♦♦the semaphore get, semgetO, 

3 ♦♦system call capabilities. ♦/ 

4 #include <stdio.h> 

5 #include <sys/types.h> 

6 #include <s3rs/ipc.h> 

7 #ijiclude <sys/s€m.h> 

8 #iiiclude <ermo.h> 

9 AStart of main C language program*/ 

10 mainO 

11 { 

12 loey t key; /♦declare as long integer^/ 

13 int oppezm, flags, nsems; 

14 int semid, qpperm flags; 

15 /♦Enter the desired key^/ 

16 printf ("NiSEnter the desired key in hex = ") ; 

17 scanf("?6ic", SJcey) ; 

18 /♦Enter the desired octal cperaticn 

19 pezmissicans.^/ 

20 printf("\nEnter the cperaticnNn" ) ; 

21 printf ("permissioiis in octal = "); 

22 scanf("9fo", &opperm); 

23 /♦Set the desired flags.^/ 

24 pcintf ( "\nEnter oarrespcnding nuitiber to\n" ) ; 

25 printf("set the desired flags:\n"); 

26 printf ( "Mo flags = 0\n" ) ; 

27 printf ("IPC_CREAT « 1\n"); 

28 printf ("nC^EXCL = 2\n"); 

29 printf ("IPC"CREAT and IFC EXCL = 3\n"); 

30 printf {" ~ Flags" = "); 

31 /«Get the flags to be set.^/ 

32 scanf("96fl", &flags); 

33 /♦Error checking (debugging) ♦/ 

34 printf {"Njifcey =Ox56ic, ofpperm = 09to, flags = 09fo\n", 

35 key, oppezm, flags); 

36 /♦Incorparate the control fields (flags) with 

37 the operation permissions. ♦/ 

y^^^^38 switch (flags) 
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continued 




39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 

53 
54 
55 




58 
59 

60 
61 

62 
63 
64 
65 
66 
67 



case 0: /»No flags are to be set.*/ 
opperin_f lags = {opperxn • ) ; 
break; 

case 1: /»Set the IPC_CREAT flag.*/ 
opperm flags = (cjppem I IPC GREAT); 
break; 

case 2: /♦Set the IPC EXCL flag.*/ 
oppenn_flags = (oppenn ' IPC_EXCL); 
break; 

case 3: /*Set the IPC GREAT and IPC EXCL 



cppenn flags = (cFperm I IPC CREAT I IPC_EXCL); 



/*Get the nmdDer of senaphores for this set.*/ 
printf ("NnEnter the number of\n"); 
printf ("desired senaphores for\n"); 
printfC'this set (25 max) = "); 
scaiif("%d", Snsenis); 

/*Check the entry.*/ 

printf ("\iiNseros = 56a\ii", nsems); 

/*Call the senget system call.*/ 

semid = semgetdcey, nsans, qpperm f lags ) ; 

/*Perfann the follcwiug if the call is unsuccessful,*/ 

if (semid =s -1) 

{ 

printfC'The semget system call failed !\n"); 
prijitf('"nie error number = %d\n", ermo); 

} 
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/♦Return the sanid uqpon successful ccnpletion.*/ 
else 



printf ("Niflhe sendd = ?6cl\n", setnid); 
exit(O); 



V 



Figure 9-9: semget() System Call Example 



Controlling Semaphores 

This section contains a detailed description of using the semctl(2) sys- 
tem call along with an example program which allows all of its capabilities 
to be exercised. 

Using semctl 

The synopsis found in the semctl(2) entry in the Programmer's Reference 
Manual is as follows: 
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#iiicliide <sys/t^pes.h> 
#incl\ide <sys/ipc.h> 
#iiiclT2de <sys/sem.h> 

int semctl (sendd, seinnum, and, arg) 

int sendd, end; 

int senimim; 

union senun 

{ 

int val; 

stxuct seroid ds »l3u; 
ushort arrayC]; 

} arg; 



The semctl(2) system call requires four arguments to be passed to it, and it 
returns an integer value. 

The semid argument must be a vaUd, non-negative, integer value that 
has already been created by using the semget(2) system call. 

The semnum argument is used to select a semaphore by its number. 
This relates to array (atomically performed) operations on the set. When a 
set of semaphores is created, the first semaphore is number 0, and the last 
semaphore has the number of one less than the total in the set. 

The cmd argument can be replaced by one of the following control 
commands (flags): 

■ GETVAL— return the value of a single semaphore within a sema- 
phore set 

■ SETVAL— set the value of a single semaphore within a semaphore set 

■ GETPID— return the Process Identifier (PID) of the process that per- 
formed the last operation on the semaphore within a semaphore set 

■ GETNCNT— return the number of processes waiting for the value of 
a particular semaphore to become greater than its current value 
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■ GETZCNT — return the number of processes waiting for the value of 
a particular semaphore to be equal to zero 

■ GETALL— return the values for all semaphores in a semaphore set 

■ SET ALL— set all semaphore values in a semaphore set 

■ IPC_STAT — return the status information contained in the associated 
data structure for the specified semid, and place it in the data struc- 
ture pointed to by the *buf pointer in the user memory area; arg.buf 
is the union member that contains the value of buf 

■ IPC_SET— for the specified semaphore set (semid), set the effective 
user /group identification and operation permissions 

■ IPC_RMID — remove the specified (semid) semaphore set along with 
its associated data structure. 

A process must have an effective user identification of 
OWNER /CREATOR or super-user to perform an IPC_SET or IPC_RMID con- 
trol command. Read /alter permission is required as applicable for the other 
control commands. 

The arg argument is used to pass the system call the appropriate union 
member for the control command to be performed: 

■ arg.val 

■ arg.buf 

■ arg.array 

The details of this system call are discussed in the example program for 
it. If you have problems understanding the logic manipulations in this pro- 
gram, read the "Using semget" section of this chapter; it goes into more 
detail than what would be practical to do for every system call. 

Example Program 

The example program in this section (Figure 9-10) is a menu driven pro- 
gram which allows all possible combinations of using the semctl(2) system 
call to be exercised. 
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From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (lines 5-9) by including the required header files as 
specified by the semctl(2) entry in the Programmer's Reference Manual Note 
that in this program errno is declared as an external variable, and therefore 
the errno.h header file does not have to be included. 

Variable, structure, and union names have been chosen to be as close as 
possible to those in the synopsis. Their declarations are self-explanatory. 
These names make the program more readable, and this is perfectly legal 
since they are local to the program. Those declared for this program and 
their purpose are as follows: 

■ semid_ds — used to receive the specified semaphore set identifier's 
data structure when an IPCJSTAT control command is performed 

■ c — used to receive the input values from the scanf(3S) function, (line 
117) when performing a SET ALL control command 

■ i— used as a counter to increment through the union arg.array when 
displaying the semaphore values for a GETALL (lines 97-99) control 
command, and when initializing the arg.array when performing a 
SET ALL (lines 115-119) control command 

■ length— used as a variable to test for the number of semaphores in a 
set against the i counter variable (lines 97, 115) 

■ uid— used to store the IPC_SET value for the effective user 
identification 

■ gid— used to store the IPCjSET value for the effective group 
identification 

■ mode— used to store the IPC_SET value for the operation permissions 

■ rtrn— used to store the return integer from the system call which 
depends upon the control command or a —1 when unsuccessful 

■ semid— used to store and pass the semaphore set identifier to the sys- 
tem call 

■ semnum — used to store and pass the semaphore number to the sys- 
tem call 
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■ cmd — used to store the code for the desired control command so that 
subsequent processing can be performed on it 

■ choice — used to determine which member (uid, gid, mode) for the 
IPCJ5ET control command that is to be changed 

■ arg.val — used to pass the system call a value to set (SETVAL) or to 
store (GETVAL) a value returned from the system call for a single 
semaphore (union member) 

■ arg.buf — a pointer passed to the system call which locates the data 
structure in the user memory area where the IPC_STAT control com- 
mand is to place its return values, or where the IPC_SET command 
gets the values to set (union member) 

■ arg.array— used to store the set of semaphore values when getting 
(GET ALL) or initializing (SETALL) (union member). 

Note that the semid_ds data structure in this program (line 14) uses the 
data structure located in the sem.h header file of the same name as a tem- 
plate for its declaration. This is a perfect example of the advantage of local 
variables. 

The arg union (lines 18-22) serves three purposes in one. The compiler 
allocates enough storage to hold its largest member. The program can then 
use the union as any member by referencing union members as if they were 
regular structure members. Note that the array is declared to have 25 ele- 
ments (0 through 24).This number corresponds to the maximum number of 
semaphores allowed per set (SEMMSL), a system tunable parameter. 

The next important program aspect to observe is that although the *buf 
pointer member (arg.buf ) of the union is declared to be a pointer to a data 
structure of the semid_ds type, it must also be initialized to contain the 
address of the user memory area data structure (line 24). Because of the 
way this program is written, the pointer does not need to be reinitialized 
later. If it was used to increment through the array, it would need to be 
reinitialized just before calling the system call. 

Now that all of the required declarations have been presented for this 
program, this is how it works. 
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First, the program prompts for a valid semaphore set identifier, which is 
stored at the address of the semid variable (lines 25-27). This is required for 
all semctl(2) system calls. 

Then, the code for the desired control command must be entered (lines 
28-42), and the code is stored at the address of the cmd variable. The code 
is tested to determine the control command for subsequent processing. 

If the GETVAL control command is selected (code 1), a message prompt- 
ing for a semaphore number is displayed (lines 49, 50). When it is entered, 
it is stored at the address of the semnum variable (line 51). Then, the sys- 
tem call is performed, and the semaphore value is displayed (lines 52-55). If 
the system call is successful, a message indicates this along with the sema- 
phore set identifier used (lines 195, 196); if the system call is unsuccessful, 
an error message is displayed along with the value of the external errno 
variable (lines 191-193). 

If the SETVAL control command is selected (code 2), a message prompt- 
ing for a semaphore number is displayed (lines 56, 57). When it is entered, 
it is stored at the address of the semnum variable (line 58). Next, a message 
prompts for the value to which the semaphore is to be set, and it is stored 
as the arg.val member of the union (lines 59, 60). Then, the system call is 
performed (lines 61, 63). Depending upon success or failure, the program 
returns the same messages as for GETVAL above. 

If the GETPID control command is selected (code 3), the system call is 
made immediately since all required arguments are known (lines 64-67), 
and the PID of the process performing the last operation is displayed. 
Depending upon success or failure, the program returns the same messages 
as for GETVAL above. 

If the GETNCNT control command is selected (code 4), a message 
prompting for a semaphore number is displayed (lines 68-72). When 
entered, it is stored at the address of the semnum variable (line 73). Then, 
the system call is performed, and the number of processes waiting for the 
semaphore to become greater than its current value is displayed (lines 74- 
77). Depending upon success or failure, the program returns the same mes- 
sages as for GETVAL above. 

If the GETZCNT control command is selected (code 5), a message 
prompting for a semaphore number is displayed (lines 78-81). When it is 
entered, it is stored at the address of the semnum variable (line 82). Then 
the system call is performed, and the number of processes waiting for the 
semaphore value to become equal to zero is displayed (lines 83, 86). 
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Depending upon success or failure, the program returns the same messages 
as for GETVAL above. 

If the GETALL control command is selected (code 6), the program first 
performs an IPC_STAT control command to determine the number of sema- 
phores in the set (lines 88-93). The length variable is set to the number of 
semaphores in the set (line 91). Next, the system call is made and, upon 
success, the arg.array union member contains the values of the semaphore 
set (line 96). Now, a loop is entered which displays each element of the 
arg.array from zero to one less than the value of length (lines 97-103). The 
semaphores in the set are displayed on a single line, separated by a space. 
Depending upon success or failure, the program returns the same messages 
as for GETVAL above. 

If the SETALL control command is selected (code 7), the program first 
performs an IPC_STAT control command to determine the number of sema- 
phores in the set (lines 106-108). The length variable is set to the number 
of semaphores in the set (line 109). Next, the program prompts for the 
values to be set and enters a loop which takes values from the keyboard and 
initializes the arg.array union member to contain the desired values of the 
semaphore set (lines 113-119). The loop puts the first entry into the array 
position for semaphore number zero and ends when the semaphore number 
that is filled in the array equals one less than the value of length. The sys- 
tem call is then made (lines 120-122). Depending upon success or failure, 
the program returns the same messages as for GETVAL above. 

If the IPC_STAT control command is selected (code 8), the system call is 
performed (line 127), and the status information returned is printed out 
(lines 128-139); only the members that can be set are printed out in this pro- 
gram. Note that if the system call is unsuccessful, the status information of 
the last successful one is printed out. In addition, an error message is 
displayed, and the errno variable is printed out (lines 191, 192). 

If the IPC_SET control command is selected (code 9), the program gets 
the current status information for the semaphore set identifier specified 
(lines 143-146). This is necessary because this example program provides for 
changing only one member at a time, and the semctl(2) system call changes 
all of them. Also, if an invalid value happened to be stored in the user 
memory area for one of these members, it would cause repetitive failures for 
this control command until corrected. The next thing the program does is 
to prompt for a code corresponding to the member to be changed (lines 
147-153). This code is stored at the address of the choice variable (line 154). 
Now, depending upon the member picked, the program prompts for the 
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new value (lines 155-178). The value is placed at the address of the 
appropriate member in the user memory area data structure, and the system 
call is made (line 181). Depending upon success or failure, the program 
returns the same messages as for GETVAL above. 

If the IPC_RMID control command (code 10) is selected, the system call 
is performed (lines 183-185). The semid along with its associated data struc- 
ture and semaphore set is removed from the UNIX operating system. 
Depending upon success or failure, the program returns the same messages 
as for the other control commands. 

The example program for the semctl(2) system call follows. It is sug- 
gested that the source program file be named semctl.c and that the execut- 
able file be named semctl. 



1 /*!Ihis is a program to illustrate 

2 ♦♦the semphore oontrol, sanctlO, 

3 ♦♦system call capabilities. 

4 ♦/ 

5 /♦Include necessary header files. ♦/ 

6 #include <stdio.h> 

7 #include <sys/types.h> 

8 #include <sys/±pc,h> 

9 #include <sys/sem.h> 

10 AStart of nain C language program^/ 

11 main( ) 

12 { 

13 extern int ermo; 

14 struct semid_ds semid_ds; 

15 int c, i, length; 

16 int uid, gid, mcxSe; 

17 int retm, semid, seninum, cmd, choice; 

18 union semun ( 

19 int val; 

20 struct semid ds ♦twf ; 

21 ushort array [25]; 

22 } arg; 

23 /♦Initialize the data structure pointer. ♦/ 

24 arg.buf = Ssemidds; 

25 /♦Enter the semaphore ID**/ 
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26 printf( "Enter the sendd = 

27 scanf("%d% &semid); 

28 /«Choose the desired camand,*/ 

29 printfC'NnEnter the number far\n"); 

30 printf ( "the desired otd : \n" ) ; 

31 printf ("GEHVAL = 1\n"); 

32 printf ("SBTVAL = 2\n"); 

33 printf ( "GETPID = 3\n"); 

34 printf ("C3EINCNT = 4\n"); 

35 printf ("GETZCNT « 5\n"); 

36 printf ("CSETALL = 6\n"); 

37 printf ("SETEALL = 7\n"); 

38 printf ("IPC_STAT = 8Nn"); 

39 printf ("IPC^SET = 9\n"); 

40 printf ("IPC^RMID = 10Nn"); 

41 printf ("Entry = "); 

42 scanf("%d", Lend); 

43 /*aieck entries.*/ 

44 printf ("Nnsemid =S6d, end =: %d\n\n". 

45 semid, and); 

46 /«Set the oonnana and do the call.*/ 

47 sviitch (ana) 

48 { 

49 case 1: /«Get a specified valiie.*/ 

50 printf ("\nEnter the sonnmn = "); 

51 scanf("%d", Ssamium); 

52 /#Do the system call.*/ 

53 retm = semctK sendd, senmum, GETTVAL, 0); 

54 printf ("\rtnie semval = 56a\n", retm); 

55 break; 

56 case 2: /*Set a specified value.*/ 

57 printf ("\nEnter the seranum = "); 

58 scanf("96d", Ssemnum); 

59 printf ("\nEnter the value = "); 

60 scanf("%d", &aig.val); 

61 /*Do the system call.*/ 

62 retm = semctKsemid, semnam, SEaVAL, arg.val); 

63 break; 

64 case 3: /*Get the process ID.*/ 
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65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 

78 
79 
80 
81 
82 
83 
84 
85 
86 

87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 



retm - seroctKsemid, 0, GETPID, 0); 
printf ("NnCRie senpid » 96d\n", retm); 
break; 

case 4: /«<3et the number of processes 
waiting for the seEraphore to 
become greater than its current 



printf ("NnEnter the senmum = "); 

scanf { "36d" , fiisenrnum) ; 

/♦Do the system call.*/ 

retm = seroctKsemid, semnum, GETNCNr, 0); 

printf ("NnfThe semncnt = %d", retm); 

break; 

case 5: /*Get the nuniber of processes 
waiting for the senaphore 



printf ("NnEnter the senmum s "); 

scanf ("%d", Ssemnum) ; 

/*Do the S3rstem call.*/ 

retm = senactKsemid, sensnum, GETZCNT, 0); 

printf ("NnCEhe serozcnt = %d", retm); 

break; 

case 6: /*Get all of the semaphores.*/ 
/*<3et the nuriber of senaphores in 

the senaphare set.*/ 
retm = seroctKsemid, 0, IPC_STAT, arg.fcuf); 
length = arg.buf->sem_nsems; 
if (retm == -1) 
goto EE%RCR; 
/*Get and print all semapihores in the 

specified set.*/ 
retm = seroctKsemid, 0, GEEALL, arg, array); 
for (i = 0; i < length; i++) 
{ 

printf {"%d", arg.array[i] ) ; 
/»Seperate each 
secapihare.*/ 
printf ("96c", ' '); 

} 

break; 
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123 
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127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
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140 

141 
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case 7; ASet all senapitares in the set.#/ 
/*Get the number of senaptoes in 
the set.*/ 

retxn = seractKsendd, 0, IPC_STAT, arg.buf); 
length = arg,buf->sem_nsems; 
priritf( "Length = %d\n", length); 
if(retm == -1) 
goto EEIPOR; 
/♦Set the semaphocre set values.*/ 
printf{"\iiEiiter each value:\n"); 
for(i = 0; i < length ; i++) 



{ 



} 



scaiif("%d", &c); 
arg.arrayti] = c; 



/♦Do the systan call.*/ 

retm = senctKsendd, 0, SETALL, arg. array); 
break; 

case 8: /*Get the status for the semaptare set.*/ 
/*Get and print the current status values,*/ 
retcn = semctKsemid, 0, IPC_STAT, arg.buf); 
printf ("Niflbe USER ID = %dNn", 

arg.tMf->s€ni_^penn.uid) ; 
printf ("The GMCUP ID = %d\n", 

arg.buf->sem_penn.gid) ; 
printf ( "Hhe operation permissions = 0%o\n" , 

arg.buf->s€m_penn.iBode) ; 
printf ("Uie number of seros^tares in set = ?6d\n", 

arg.baf->sem_nsaDas) ; 
printf ("The last semop time = %d\n", 

arg.buf->sem_otiiie) ; 
printf ("The last change time = %d\n", 

arg . buf->s€m_ctiite ) ; 
break; 

case 9: /♦Select and diange the desired 

member of the data structure. ♦/ 
/*<3et the current status values.*/ 
retm = senctKsendd, 0, IPC STAT, arg.buf); 
if(retm==-1) 
goto ERROR; 
/♦Select the member to change.*/ 
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148 printf ("\nEnter the nuinber for the\n"); 

149 printf ( "meniber to be changed: \n" ) ; 

150 printf {"semperm.md = 1\n"); 

151 printf ("sem_penn.gid = 2\n"); 

152 printf ("sem_penn.inode = 3\n"); 

153 printf ("Entry = "); 

154 scanf &.choice); 

1 55 switch( choice ) { 

156 case 1: /^Change the user ID,*/ 

157 printf ("\nEnter USER ID = "); 

158 scanf ("?6d", &uid); 

159 arg.buf^S€ro_perm.iiid = uid; 

160 printf("\iflJSER ID = %d\n", 

16 1 arg .baf->sem_perni. uid ) ; 

162 break; 

163 case 2: /^Change the group ID.*/ 

164 printf ("\nEnter GROUP ID = "); 

165 scanf (W, S.gid); 

166 arg.buf^seni_perm.gid = gid; 

167 printf ("\rt3KXJP ID = 56a\n", 

168 arg .buf->sem_penn. gid ) ; 

169 break; 

170 case 3: /^Change the mode portion of 

171 the operation 

172 permissions.*/ 

173 printf ("\nEnter MDDE = 

174 scanf ("%o", Smode); 

175 arg.buf->sem_penn.inDde = mode; 

176 printf ("NnMDDE = 0%o\n", 

177 arg. buf->S€m_perm. mode) ; 

178 break; 

179 ) 

180 /»Do the change.*/ 

181 retm = seroctKsemid, 0, IPC SBT, arg.tuf); 

182 break; 

183 case 10: /*Renove the seroid along with its 

184 data structure.*/ 

185 retm = semctKsemid, 0, IPC RMTD, 0); 

186 } 

187 /*Perfann the following if the call is unsuccessful.*/ 
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priiitf ( "\nNimie semctl system call was successfulNn" ) ; 
printf ("for sendd = %a\2i", sendd); 
exit (0); 



if (retm -1) 



printf ("\n\nnie seinctl system call failed l\n"); 
printf ("The ercor number « %d\n", ermo); 
exit(O); 





Figure 9-10: semctl() System Call Example 



Operations on Semaphores 

This section contains a detailed description of using the semop(2) sys- 
tem call along with an example program which allows all of its capabilities 
to be exercised. 

Using semop 

The synopsis found in the semop(2) entry in the Programmer's Reference 
Manual is as follows: 
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#include <sys/types.h> 
#inclu(3e <sys/ipc.h> 
#include <sys/sein.h> 

int seopp (sendd, sope^ nsops) 
int sendd; 

struct senoibuf ♦♦scqps; 
unsigned iiscps; 

vJl 

The semop(2) system call requires three arguments to be passed to it, 
and it returns an integer value. 

Upon successful completion, a zero value is returned and when unsuc- 
cessful it returns a —1. 

The semid argument must be a valid, non-negative, integer value. In 
other words, it must have already been created by using the semget(2) sys- 
tem calL 

The sops argument is a pointer to an array of structures in the user 
memory area that contains the following for each semaphore to be changed: 

■ the semaphore number 

■ the operation to be performed 

■ the control command (flags) 

The •♦sops declaration means that a pointer can be initialized to the 
address of the array, or the array name can be used since it is the address of 
the first element of the array, sembuf is the tag name of the data structure 
used as the template for the structure members in the array; it is located in 
the #include <sys/sem.h> header file. 

The nsops argument specifies the length of the array (the number of 
structures in the array). The maximum size of this array is determined by 
the SEMOPM system tunable parameter. Therefore, a maximum of 
SEMOPM operations can be performed for each semop(2) system call. 
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The semaphore number determines the particular semaphore within the 
set on which the operation is to be performed. 

The operation to be performed is determined by the following: 

■ a positive integer value means to increment the semaphore value by 
its value 

■ a negative integer value means to decrement the semaphore value by 
its value 

■ a value of zero means to test if the semaphore is equal to zero 

The following operation commands (flags) can be used: 

■ IPC_NOWAIT— this operation command can be set for any opera- 
tions in the array. The system call will return unsuccessfully without 
changing any semaphore values at all if any operation for which 
IPC_NOWAIT is set cannot be performed successfully. The system 
call will be unsuccessful when trying to decrement a semaphore more 
than its current value, or when testing for a semaphore to be equal to 
zero when it is not. 

■ SEMJJNDO— this operation command allows any operations in the 
array to be undone when any operation in the array is unsuccessful 
and does not have the IPC_NOWAIT flag set. That is, the blocked 
operation waits until it can perform its operation; and when it and all 
succeeding operations are successful, all operations with the 
SEM_UNDO flag set are undone. Remember, no operations are per- 
formed on any semaphores in a set until all operations are successful. 
Undoing is accomplished by using an array of adjust values for the 
operations that are to be undone when the block- . operation and all 
subsequent operations are successful. 

Example Program 

The example program in this section (Figure 9-11) is a menu driven pro- 
gram which allows all possible combinations of using the semop(2) system 
call to be exercised. 

From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 
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This program begins (lines 5-9) by including the required header files as 
specified by the shmop(2) entry in the Programmer's Reference Manual Note 
that in this program errno is declared as an external variable, and therefore, 
the errno.h header file does not have to be included. 

Variable and structure names have been chosen to be as close as possible 
to those in the synopsis. Their declarations are self-explanatory. These 
names make the program more readable, and this is perfectly legal since the 
declarations are local to the program. The variables declared for this pro- 
gram and their purpose are as follows: 

■ seinbuf[10]— used as an array buffer (line 14) to contain a maximum 
of ten sembuf type structures; ten equals SEMOPM, the maximum 
number of operations on a semaphore set for each semop(2) system 
call 

■ ♦sops— used as a pointer (line 14) to sembuf[10] for the system call 
and for accessing the structure members within the array 

■ rtrn — used to store the return values from the system call 

■ flags— used to store the code of the IPC_NOWAIT or SEM_UNDO 
flags for the semop(2) system call (line 60) 

■ i— used as a counter (line 32) for initializing the structure members 
in the array, and used to print out each structure in the array (line 
79) 

■ nsops— used to specify the number of semaphore operations for the 
system call— must be less than or equal to SEMOPM 

■ semid — used to store the desired semaphore set identifier for the sys- 
tem call 

First, the program prompts for a semaphore set identifier that the sys- 
tem call is to perform operations on (lines 19-22). Semid is stored at the 
address of the semid variable (line 23). 

A message is displayed requesting the number of operations to be per- 
formed on this set (lines 25-27). The number of operations is stored at the 
address of the nsops variable (line 28), 
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Next a loop is entered to initialize the array of structures (lines 30-77). 
The semaphore number, operation, and operation command (flags) are 
entered for each structure in the array. The number of structures equals the 
number of semaphore operations (nsops) to be performed for the system 
call, so nsops is tested against the i counter for loop control. Note that sops 
is used as a pointer to each element (structure) in the array, and sops is 
incremented just like i. sops is then used to point to each member in the 
structure for setting them. 

After the array is initialized, all of its elements are printed out for feed- 
back (lines 78-85). 

The sops pointer is set to the address of the array (lines 86, 87). Sembuf 
could be used directly, if desired, instead of sops in the system call. 

The system call is made (line 89), and depending upon success or 
failure, a corresponding message is displayed. The results of the 
operation(s) can be viewed by using the semctl() GETALL control command. 

The example program for the semop(2) system call follows. It is sug- 
gested that the source program file be named semop.c and that the execut- 



1 AThis is a program to illustrate 

2 *#the semphore operations, sencpO, 

3 ♦♦system call capabilities. 

4 ♦/ 

5 Ainclude necessary header files. ♦/ 

6 #iiiclude <stdio.h> 

7 ^include <ays/types,h> 

8 #incl\2de <sys/ipc.h> 

9 #include <sys/sein,h> 

10 /♦Start of main C language program*/ 

11 mainO 

12 { 

13 extern int ercno; 

14 struct sembuf sembuf [10], ♦sops; 

15 char string[]; 



able file be named semop. 





16 
17 
18 



int retm, flags, sem num. i. semid: 
xmsigned nsops; 

sops = sendTuf ; /♦Pointer to array sembuf.*/ 
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continued 



20 printf ("\nEnter the sendd of\n"); 

21 printf ("the sanaphore set toNn"); 

22 printf ("be operated on = "); 

23 scaiif("%d", &sendd); 

24 printf ("Nnseraid = 96d", sendd); 

25 AEnter the imrhber of operations.*/ 

26 printf ("SnEnter the number of semaphoreNn" ) ; 

27 printf ("operations for this set = "); 

28 scanf("%d", finsops); 

29 printf ("\nnosops = %d", nsops); 

30 /♦Initialize the array for the 

31 nurriber of operations to be performed.*/ 

32 for(i - 0; i < nsops; i++, spps++) 

33 { 

34 /♦This determines the seaiaphore in 

35 the semaphore set,*/ 

36 printf ("NnEnter the semaphoreNn" ) ; 

37 printf ("number (semnum) = "); 

38 scanf("%a", &s€m_num); 

39 sops— >sem num = sem_num; 

40 printf ("Nnfnie sem num = spps->sem_nLim) ; 

41 /*Enter a (-)nuntoer to decreroent, 

42 an imsigned number (no +) to increment, 

43 or zero to test for zero. These values 

44 are entered into a string and converted 

45 to integer values.*/ 

46 printf("\nEnter the pperation for\n"); 

47 printf ("the semaphore (sem^pp) = "); 

48 scanf("%s", string); 

49 sope->sem_op = atoi ( string ) ; 

50 printf("\nsem_op = 56dNn", sop3->sem pp) ; 

51 /*Specify the desired flags.*/ 

52 printf ("\nEnter the oorrespondingNn" ) ; 

53 printf ("number for the desired\n"); 

54 printf{"flags:\n"); 

55 printf ("NO flags = 0\n"); 

56 printf ("IPC_NCWAIT = 1\n"); 

57 printf ("SEM~UNDO = 2\n"); 
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continued 



58 
59 
60 



printf ( "IPC_NCWAIT and SEM UNDO = 3\n" ) ; 
printfC " Flags " = "); 

scanf{"9ficl", &flags); 



61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 



switch{ flags) 
{ 

case 0: 

sops->san_flg = 0; 

break; 
case 1: 

sops->sem_flg = IPC NOWAIT; 

break; 
case 2: 

SCps->S€m_flg = SEM^UNEO; 

break; 
case 3: 

sape->sem_flg = IPC_NCWAIT I SEM UNEX); 
break; 

} 

prantf { "\nElags = 0%o\n", sops->sem fig); 



78 
79 
80 
81 
82 
83 
84 
85 



AErint oat each structure in the array.*/ 

for(i = 0; i < nsops; i++) 

{ 

printf ( "\iisem_i3UEa = 56iNn" , senifauf [i] .sem num) ; 
printf ( "sem ap = JfidNn", sarbuf Ci].sem qp) ; 
printf ( "sem flg = %o\n*', sembuf [il.sem fig); 
printf ("9fc% ' '); 



86 sops = senbuf ; /*Reset the pointer to 

87 senibuf[0].»/ 

88 ADo the serapp system call.*/ 

89 retm = semopCsemid, sops, nsqps); 

90 if(retm==-1) { 

91 printf("\nSemop failed. "); 

92 printf ("Error = %d\n", ermo); 

93 } 

94 else ( 

95 printf ( "NnSemop was successfulNn" ) ; 

96 printf ("for sardd = %dNn", semid); 
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97 priiitfC "Value returned = %d\n'\ retm); 

98 } 

99 } 




Figure 9-11: seinop(2) System Call Example 
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The shared memory type of IPC allows two or more processes (execut- 
ing programs) to share memory and consequently the data contained there. 
This is done by allowing processes to set up access to a common virtual 
memory address space. This sharing occurs on a segment basis, which is 
memory management hardware dependent. 

This sharing of memory provides the fastest means of exchanging data 
between processes. However, processes that reference a shared memory 
segment must reside on one processor. Consequently, processes running on 
different processors (such as in an RFS network or a multiprocessing 
environment) may not be able to use shared memory segments. 

A process initially creates a shared memory segment facility using the 
shmget(2) system call. Upon creation, this process sets the overall operation 
permissions for the shared memory segment facility, sets its size in bytes, 
and can specify that the shared memory segment is for reference only 
(read-only) upon attachment. If the memory segment is not specified to be 
for reference only, all other processes with appropriate operation permis- 
sions can read from or write to the memory segment. 

There are two operations that can be performed on a shared memory 
segment: 

■ shmat(2) — shared memory attach 

■ shmdt(2) shared memory detach 

Shared memory attach allows processes to associate themselves with the 
shared memory segment if they have permission. They can then read or 
write as allowed. 

Shared memory detach allows processes to disassociate themselves from 
a shared memory segment. Therefore, they lose the ability to read from or 
write to the shared memory segment. 

The original owner/creator of a shared memory segment can relinquish 
ownership to another process using the shmctl(2) system call. However, 
the creating process remains the creator until the facility is removed or the 
system is reinitialized. Other processes with permission can perform other 
functions on the shared memory segment using the shmctl(2) system call. 
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System calls, which are documented in the Programmer's Reference 
Manual, make these shared memory capabilities available to processes. The 
calling process passes arguments to a system call, and the system call either 
successfully or unsuccessfully performs its function. If the system call is 
successful, it performs its function and returns the appropriate information. 
Otherwise, a known error code (—1) is returned to the process, and the 
external variable errno is set accordingly. 



Using Shared Memory 

The sharing of memory between processes occurs on a virtual segment 
basis. There is one and only one instance of an individual shared memory 
segment existing in the UNIX operating system at any point in time. 

Before sharing of memory can be realized, a uniquely identified shared 
memory segment and data structure must be created. The unique identifier 
created is called the shared memory identifier (shmid); it is used to identify 
or reference the associated data structure. The data structure includes the 
following for each shared memory segment: 

■ operation permissions 

■ segment size 

■ segment descriptor 

■ process identification performing last operation 

■ process identification of creator 

■ current number of processes attached 

■ in memory number of processes attached 

■ last attach time 

■ last detach time 

■ last change time 

The C Programming Language data structure definition for the shared 
memory segment data structure is located in the /usr/ include /sys/shin»h 
header file. It is as follows: 
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** There is a shared mem id data structure for 

** each segment in the system. 

#/ 

struct shmid ds { 



struct ipc_perm 


shm_pezm; 


/» operation permission struct ♦/ 


ant 


shm^segsz; 


/♦ segment size ♦/ 


struct region 


*shm reg; 


/* ptr to region structure */ 


char 


pad[4]; 


/♦ for swap oonpatibility «/ 


ushort 


shmlpid; 


/* pid of last shmop ♦/ 


ushort 


shm cpxid; 


/♦ pid of creator ♦/ 


ushort 


shmnattch; 


/» used only for shminf o ♦/ 


lashort 


shm^cnattch; 


/♦ used only for shuimfo ♦/ 


tiiDe_t 


shm_atime; 


/• last shmat time ♦/ 


time^t 


shm_dtiine; 


/» last shmdt time */ 


time_t 


shm_ctiine; 


/* last diange time ♦/ 




Note that the shmjperm member of this structure uses ipc j)erin as a 
template. The breakout for the operation permissions data structure is 
shown in Figure 9-1. 

The ipc_j)erm data structure is the same for all IPC facilities, and it is 
located in the #include <sys/ipc.h> header file. It is shown in the intro- 
duction section of "Messages." 

Figure 9-12 is a table that shows the shared memory state information. 
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Shared Memory States 



Lock Bit 


Swap Bit 


Allocated Bit 


Implied State 











Unallocated Segment 








1 


Incore 





1 





Unused 





1 


1 


On Disk 


1 





1 


Locked Incore 


1 


1 





Unused 


1 








Unused 


1 


1 


1 


Unused 



Figure 9-12: Shared Memory State Information 



The implied states of Figure 9-12 are as follows: 

■ Unallocated Segment — the segment associated with this segment 
descriptor has not been allocated for use. 

■ Incore — the shared segment associated with this descriptor has been 
allocated for use. Therefore, the segment does exist and is currently 
resident in memory. 

■ On Disk— the shared segment associated with this segment descrip- 
tor is currently resident on the swap device. 

■ Locked Incore— the shared segment associated with this segment 
descriptor is currently locked in memory and will not be a candidate 
for swapping until the segment is unlocked. Only the super-user 
may lock and unlock a shared segment. 

■ Unused— this state is currently unused and should never be encoun- 
tered by the normal user in shared memory handling. 
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The shmget(2) system call is used to perform two tasks: 

■ to get a new shmid and create an associated shared memory segment 
data structure for it 

■ to return an existing shmid that already has an associated shared 
memory segment data structure 

The task performed is determined by the value of the key argument 
passed to the shmget(2) system call. For the first task, if the key is not 
already in use for an existing shmid and IPC_CREAT flag is set in shmflg, a 
new shmid is returned with an associated shared memory segment data 
structure created for it provided no system tunable parameters would be 
exceeded. 

There is also a provision for specifying a key of value zero which is 
known as the private key (IPC^PRIVATE = 0); when specified, a new shmid 
is always returned with an associated shared memory segment data structure 
created for it unless a system tunable parameter would be exceeded. When 
the ipcs command is performed, the KEY field for the shmid is all zeros. 

For the second task, if a shmid exists for the key specified, the value of 
the existing shmid is returned. If it is not desired to have an existing 
shmid returned, a control command (IPC__EXCL) can be specified (set) in the 
shmflg argument passed to the system call. The details of using this system 
call are discussed in the "Using shmget" section of this chapter. 

When performing the first task, the process that calls shmget becomes 
the owner/ creator, and the associated data structure is initialized accord- 
ingly. Remember, ownership can be changed, but the creating process 
always remains the creator; see the "Controlling Shared Memory" section in 
this chapter. The creator of the shared memory segment also determines 
the initial operation permissions for it. 

Once a uniquely identified shared memory segment data structure is 
created, shared memory segment operations [shmopO] and control 
[shmctl(2)] can be used. 

Shared memory segment operations consist of attaching and detaching 
shared memory segments. System calls are provided for each of these 
operations; they are shmat(2) and shmdt(2). Refer to the "Operations for 
Shared Memory" section in this chapter for details of these system calls. 
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Shared memory segment control is done by using the shmctl(2) system 
call. It permits you to control the shared memory facility in the following 
ways: 

■ to determine the associated data structure status for a shared memory 
segment (shmid) 

■ to change operation permissions for a shared memory segment 

■ to remove a particular shmid from the UNIX operating system along 
with its associated shared memory segment data structure 

■ to lock a shared memory segment in memory 

■ to unlock a shared memory segment 

Refer to the "Controlling Shared Memory" section in this chapter for 
details of the shmctl(2) system call. 



Getting Shared Memory Segments 

This section gives a detailed description of using the shmget(2) system 
call along with an example program illustrating its use. 

Using shmget 

The synopsis found in the shmget(2) entry in the Programmer's Reference 
Manual is as follows: 




#include <sys/types.h> 
#incl\2de <sys/ipc . h> 
#iiiclude <sys/shm.h> 



int shmget (key, size, shmflg) 

key t key; 

int size, shmflg; 
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All of these include files are located in the /usr/ include /sys directory 
of the UNIX operating system. The following line in the synopsis: 

int shitget (toey, size, shmflg) 

informs you that shmget(2) is a function with three formal arguments that 
returns an integer type value, upon successful completion (shmid). The 
next two lines: 

key_t key; 

int size, shmflg; 

declare the types of the formal arguments. The variable key_t is declared 
by a typedef in the types.h header file to be an integer. 

The integer returned from this function upon successful completion is 
the shared memory identifier (shmid) that was discussed earlier. 

As declared, the process calling the shmget(2} system call must supply 
three arguments to be passed to the formal key, size, and shmflg argu- 
ments. 

A new shmid with an associated shared memory data structure is pro- 
vided if either 

■ key is equal to IPC_PRIVATE, 

or 

■ key is passed a unique hexadecimal integer, and shmflg awk with 
IPC_CREAT is TRUE. 

The value passed to the shmflg argument must be an integer type octal 
value and will specify the following: 

■ access permissions 

■ execution modes 

■ control fields (commands) 

Access permissions determine the read /write attributes and execution 
modes determine the user /group /other attributes of the shmflg argument. 
They are collectively referred to as "operation permissions." Figure 9-13 
reflects the numeric values (expressed in octal notation) for the valid opera- 
tion permissions codes. 
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Operation Permissions 



Octal Value 



Write by User 
Read by Group 
Write by Group 
Read by Others 
Write by Others 



Read by User 



00400 
00200 
00040 
00020 
00004 
00002 



Figure 9-13: Operation Permissions Codes 



A specific octal value is derived by adding the octal values for the operation 
permissions desired. That is, if read by user and read/ write by others is 
desired, the code value would be 00406 (00400 plus 00006). There are con- 
stants located in the shm.h header file which can be used for the user 
(OWNER). They are as follows: 

SHM_R 0400 
SHM_W 0200 

Control commands are predefined constants (represented by all upper- 
case letters). Figure 9-14 contains the names of the constants that apply to 
the shmgetO system call along with their values. They are also referred to 
as flags and are defined in the ipc.h header file. 



Figure 9-14: Control Commands (Flags) 



The value for shmflg is, therefore, a combination of operation permis- 
sions and control commands. After determining the value for the operation 
permissions as previously described, the desired flag(s) can be specified. 
This is accomplished by bitwise ORing (| ) them with the operation permis- 
sions; the bit positions and values for the control commands in 



Control Command 



Value 



IPC^CREAT 
IPC EXCL 



0001000 
0002000 
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relation to those of the operation permissions make this possible. It is illus- 
trated as follows: 





Octal Value 


Binary Value 


IPC_CREAT 




1000 


000 001 000 000 000 


I ORed by User 




00400 


000 000 100 000 000 


shmflg 




1400 


000 001 100 000 000 



The shmflg value can be easily set by using the names of the flags in 
conjunction with the octal operation permissions value: 

shmid = shinget (key, size, (IPC_CREAT I 0400)); 

shmid = shmget (key, size, (IPC_^a^EAT I IPC_EXCL ! 0400)); 

As specified by the shmget(2) entry in the Programmer's Reference 
Manual, success or failure of this system call depends upon the argument 
values for key, size, and shmflg or system tunable parameters. The system 
call will attempt to return a new shmid if one of the following conditions is 
true: 

■ Key is equal to IPC_PRIVATE (0). 

■ Key does not already have a shmid associated with it, and (shmflg & 
IPC_CREAT) is "true" (not zero). 

The key argument can be set to IPC_PRIVATE in the following ways: 
shmid = shmget (IPC^HCrVATE, size, shmflg); 

or 

shmid = shmget ( , size, shmflg); 

This alone will cause the system call to be attempted because it satisfies the 
first condition specified. Exceeding the SHMMNI system tunable parameter 
always causes a failure. The SHMMNI system tunable parameter deter- 
mines the maximum number of unique shared memory segments (shmids) 
in the UNIX operating system. 
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The second condition is satisfied if the value for key is not already asso- 
ciated with a shmid and the bitwise awk of shmflg and IPC CREAT is 
"true" (not zero). This means that the key is unique (not in use) within the 
UNIX operating system for this facility type and that the IPC^CREAT flag is 
set (shmflg | IPC_CREAT). SHMMNI applies here also, just as for condition 
one. 

IPC_EXCL is another control command used in conjunction with 
IPC CREAT to exclusively have the system call fail if, and only if, a shmid 
exists for the specified key provided. This is necessary to prevent the pro- 
cess from thinking that it has received a new (unique) shmid when it has 
not. In other words, when both IPC_CREAT and IPC_EXCL are specified, a 
unique shmid is returned if the system call is successful. Any value for 
shmflg returns a new shmid if the key equals zero (IPC PRIVATE). 

The system call will fail if the value for the size argument is less than 
SHMMIN or greater than SHMMAX. These tunable parameters specify the 
minimum and maximum shared memory segment sizes. 

Refer to the shmget(2) manual page for specific associated data structure 
initialization for successful completion. The specific failure conditions with 
error names are contained there also. 

Example Program 

The example program in this section (Figure 9-15) is a menu driven pro- 
gram which allows all possible combinations of using the shmget(2) system 
call to be exercised. 

From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (lines 4-7) by including the required header files as 
specified by the shmget(2) entry in the Programmer's Reference Manual. Note 
that the errno.h header file is included as opposed to declaring errno as an 
external variable; either method will work. 

Variable names have been chosen to be as close as possible to those in 
the synopsis for the system call. Their declarations are self-explanatory. 
These names make the program more readable, and this is perfectly legal 
since they are local to the program. The variables declared for this program 
and their purposes are as follows: 
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■ key— used to pass the value for the desired key 

■ opperm — used to store the desired operation permissions 

■ flags— used to store the desired control commands (flags) 

■ opperm_flags— used to store the combination from the logical ORing 
of the opperm and flags variables; it is then used in the system call 
to pass the shmflg argument 

■ shmid— used for returning the message queue identification number 
for a successful system call or the error code (—1) for an unsuccessful 
one 

■ size — used to specify the shared memory segment size. 

The program begins by prompting for a hexadecimal key^ an octal 
operation permissions code, and finally for the control command combina- 
tions (flags) which are selected from a menu (lines 14-31). All possible com- 
binations are allowed even though they might not be viable. This allows 
observing the errors for illegal combinations. 

Next, the menu selection for the flags is combined with the operation 
permissions, and the result is stored at the address of the opperm_flags vari- 
able (lines 35-50). 

A display then prompts for the size of the shared memory segment, and 
it is stored at the address of the size variable (lines 51-54). 

The system call is made next, and the result is stored at the address of 
the shmid variable (line 56). 

Since the shmid variable now contains a valid message queue identifier 
or the error code (—1), it is tested to see if an error occurred (line 58). If 
shmid equals —1, a message indicates that an error resulted and the external 
errno variable is displayed (lines 60, 61). 

If no error occurred, the returned shared memory segment identifier is 
displayed (line 65). 

The example program for the shmget(2) system call follows. It is sug- 
gested that the source program file be named shmget.c and that the execut- 
able file be named shmget. 
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1 AlJiis is a pcrogrdin to illustxate 

2 **t:he shared menory get, shinget{ ) , 

3 ♦♦system call capabilities. ♦/ 

4 #include <sys/types ,h> 

5 #include <sys/ij)C.h> 

6 #incltide <sys/shm.h> 

7 ^include <emio.h> 

8 /♦Start of nain C language program^/ 

9 roain( ) 

10 { 

11 key_t key; /«declare as loiig integer ♦/ 

12 int ppperm, flags; 

13 ijit shmid, size, oppem_f lags ; 

14 /♦Enter the desired key^/ 

15 printf ("Enter the desired key in hex = "); 

16 scaiif("%x'\ &key); 

17 /♦Enter the desired octal operation 

18 permissions. ♦/ 

19 printf ("NiiEnter the cperationNn" ) ; 

20 printf ("permissions in octal = "); 

21 scanfC'Jfo", Sopperm) ; 

22 /♦Set the desired flags. ♦/ 

23 printf { "\nEnter oorrespaniing number to\n" ) ; 

24 printf ("set the desired flags :\n"); 

25 printf ("No flags = 0\n"); 

26 printf ("IPC^CREAT = 1\n"); 

27 printf ("IPc"exCX = 2\n"); 

28 printf ("IPC CREAT and EPC EXCL = 3\n"); 

29 printf (" " Flags" = 

30 /«Get the flag(s) to be set.*/ 

31 scanf("%d". Sflags); 

32 /♦Check the values. ♦/ 

33 printf ("\rikey =0xS6jc, qpperm = 09to, flags = 09to\n", 

34 key, opperm, flags); 

35 /♦Incorporate the control fields (flags) with 

36 the operation permssions^/ 

37 switch (flags) 

38 { 

39 case 0: /♦!*> flags are to be set,*/ 

40 opperm_flags = (opperm I 0); 

41 break;" 

42 case 1: /♦Set the IPC_CREAT flag,*/ 

43 qpp er ro flags = (opperm ! IPC CREAT) ; 

44 break; 
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45 case 2: ASet the HC^EXCL flag.*/ 

46 oppem_flags = (cjpperm I IPC_EXCL); 

47 break; 

48 case 3: ASet the IPC^CREAT and IPC^EXCL flags.*/ 

49 oRpenn_f lags = (qppem I IPC C31EAT I IPC EXCL) ; 

50 } " 

51 /*Get the size of the segment in bytes.*/ 

52 pruitf ("NnEnter the segment"); 

53 prijitf ("Nnsize in bytes = "); 

54 scanf ("561", &size); 

55 /*Call the shmget system call.*/ 

56 shmid = shmget (key, size, oppem flags) ; 

57 /♦Perfcoa the follcwing if the call is unsuccessfiil.*/ 

58 if (shmid == -1) 

59 { 

60 printf ("\iflhe shmget systan call failed !\n"); 

61 printf ("The error number = %d\n", emx)); 

62 } 

63 /*Retum the shraid upon successful oanpletion.*/ 

64 else 

65 printf ("Niflhe shmid = 5fid\n", shmid); 

66 exit(O); 

67 } 




Figure 9-15: shmget(2) System Call Example 



Controlling Shared Memory 

This section gives a detailed description of using the shmctl(2) system 
call along with an example program which allows all of its capabilities to be 
exercised. 
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Using shmctl 

The synopsis found in the shmctl(2) entry in the Programmer's Reference 
Manual is as follows: 



#include <sys/types.h> 
#incltjde <sys/ipc.h> 
#include <sys/shm.h> 

int shnctl (shmid, and, buf) 
int shmid, csnod; 
struct shmid ds ♦buf; 

V 

The shmctl(2) system call requires three arguments to be passed to it, and 
shmctl(2) returns an integer value. 

Upon successful completion, a zero value is returned; and when unsuc- 
cessful, shmctlO returns a —1. 

The shmid variable must be a valid, non-negative, integer value. In 
other words, it must have already been created by using the shmget(2) sys- 
tem call. 

The cmd argument can be replaced by one of following control com- 
mands (flags): 

■ IPC_STAT — return the status information contained in the associated 
data structure for the specified shmid and place it in the data struc- 
ture pointed to by the *buf pointer in the user memory area 

■ IPC_SET— for the specified shmid, set the effective user and group 
identification, and operation permissions 

■ IPC_RMID— remove the specified shmid along with its associated 
shared memory segment data structure 

■ SHM LOCK — lock the specified shared memory segment in memory, 
must be super-user 
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■ SHM_UNLOCK— unlock the shared memory segment from memory, 
must be super-user. 

A process must have an effective user identification of 
OWNER/CREATOR or super-user to perform an IPC_SET or IPC__RMID con- 
trol command. Only the super-user can perform a SHM LOCK or 
SHM_UNLOCK control command. A process must have read permission to 
perform the IPC_STAT control command. 

The details of this system call are discussed in the example program for 
it. If you have problems understanding the logic manipulations in this pro- 
gram, read the "Using shmget" section of this chapter; it goes into more 
detail than what would be practical to do for every system call. 

Example Program 

The example program in this section (Figure 9-16) is a menu driven pro- 
gram which allows all possible combinations of using the shmctl(2) system 
call to be exercised. 

From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (lines 5-9) by including the required header files as 
specified by the 5hmctl(2) entry in the Programmer's Reference Manual Note 
in this program that errno is declared as an external variable, and therefore, 
the errno.h header file does not have to be included. 

Variable and structure names have been chosen to be as close as possible 
to those in the synopsis for the system call. Their declarations are self- 
explanatory. These names make the program more readable, and it is per- 
fectly legal since they are local to the program. The variables declared for 
this program and their purposes are as follows: 

■ uid— used to store the IPCJSET value for the effective user 
identification 

■ gid — used to store the IPC_SET value for the effective group 
identification 

■ mode — used to store the IPC_SET value for the operation permissions 
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■ rtrn — used to store the return integer value from the system call 

■ shmid — used to store and pass the shared memory segment identifier 
to the system call 

■ command — used to store the code for the desired control command 
so that subsequent processing can be performed on it 

■ choice — used to determine which member for the IPC_SET control 
command that is to be changed 

■ shmid_ds — used to receive the specified shared memory segment 
identifier's data structure when an IPC_STAT control command is 
performed 

■ *buf — a pointer passed to the system call which locates the data 
structure in the user memory area where the IPCJ5TAT control com- 
mand is to place its return values or where the IPC_SET command 
gets the values to set. 

Note that the shmid_ds data structure in this program (line 16) uses the 
data structure located in the shm.h header file of the same name as a tem- 
plate for its declaration. This is a perfect example of the advantage of local 
variables. 

The next important thing to observe is that although the *buf pointer is 
declared to be a pointer to a data structure of the shmid_ds type, it must 
also be initialized to contain the address of the user memory area data struc- 
ture (line 17). 

Now that all of the required declarations have been explained for this 
program, this is how it works. 

First, the program prompts for a valid shared memory segment 
identifier which is stored at the address of the shmid variable (lines 18-20). 
This is required for every shmctl(2) system call. 

Then, the code for the desired control command must be entered (lines 
21-29), and it is stored at the address of the command variable. The code is 
tested to determine the control command for subsequent processing. 

If the IPC__STAT control command is selected (code 1), the system call is 
performed (lines 39, 40) and the status information returned is printed out 
(lines 41-71). Note that if the system call is unsuccessful (line 146), the 
status information of the last successful call is printed out. In addition, an 
error message is displayed and the errno variable is printed out (lines 148, 
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149). If the system call is successful, a message indicates this along with the 
shared memory segment identifier used (lines 151-154). 

If the IPC_SET control command is selected (code 2), the first thing 
done is to get the current status information for the shared memory 
identifier specified (lines 90-92). This is necessary because this example pro- 
gram provides for changing only one member at a time, and the system call 
changes all of them. Also, if an invalid value happened to be stored in the 
user memory area for one of these members, it would cause repetitive 
failures for this control command until corrected. The next thing the pro- 
gram does is to prompt for a code corresponding to the member to be 
changed (lines 93-98). This code is stored at the address of the choice vari- 
able (line 99). Now, depending upon the member picked, the program 
prompts for the new value (lines 105-127). The value is placed at the 
address of the appropriate member in the user memory area data structure, 
and the system call is made (lines 128-130). Depending upon success or 
failure, the program returns the same messages as for IPC_STAT above. 

If the IPC_RMID control command (code 3) is selected, the system call is 
performed (lines 132-135), and the shmid along with its associated message 
queue and data structure are removed from the UNIX operating system. 
Note that the ♦buf pointer is not required as an argument to perform this 
control command and its value can be zero or NULL. Depending upon the 
success or failure, the program returns the same messages as for the other 
control commands. 

If the SHM_LOCK control command (code 4) is selected, the system call 
is performed (lines 137,138). Depending upon the success or failure, the 
program returns the same messages as for the other control commands. 

If the SHM_UNLOCK control command (code 5) is selected, the system 
call is performed (lines 140-142). Depending upon the success or failure, 
the program returns the same messages as for the other control commands. 

The example program for the shmctl(2) system call follows. It is sug- 
gested that the source program file be named shmctl.c and that the execut- 
able file be named shmctl. 
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1 /♦Uiis is a program to illustrate 

2 **the shared mencry control, shmctK ) , 

3 »*s3rstem call capabilities. 

4 */ 

5 AIncludfi necessary header files.*/ 

6 #iiiclude <stdio.h> 

7 #ijiclude <sys/types.h> 

8 #iiiclude <sys/ipc.h> 

9 #incli3de <S5re/shm.h?- 



10 /*Start of main C langfuage program*/ 

11 inain( ) 

12 { 

13 extern int ermo; 

14 int uid, gid, mode; 

15 int rtm, shmid, ccmnard, choice; 

16 stnict shinid_ds shmid_ds, *buf ; 

17 buf = £iSihinid_ds; 

18 /*Get the shndd, and ocninand.*/ 

19 printf ("Enter the shmid = "); 

20 scanf("%a", &shmid); 

21 printf ("\nEiiter the number for\n"); 

22 printf {"the desired ocninard:\n") ; 

23 printf {"IPC_STAT = 1\n"); 

24 printf ("IPC^SBT = 2\n"); 

25 printf ("IPc'rmid = 3\n"); 

26 printf ("SHm'lOCK = 4\n"); 

27 printf ("SHm'unloCK = 5\n"); 

28 printf ("Entry = "); 

29 scanf("?6d", Socranani) ; 



30 /*Checlc the values.*/ 

31 printf {"\nshmid =3&a, oontnand = %d\n", 

32 shndd, ccmnand); 



33 switch (ccmnand) 

34 { 

35 case 1: /♦Use shmctlO to dv:?5licate 

36 the data structmre for 

37 shmid in the shmid ds area pointed 

38 to by buf and then print it out.*/ 

39 rtm = shnctlCshmid, IPC STAT» 
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40 


buf); 


41 


printf {"Sxmie USER ID = %d\n", 


42 


lxif->shm_penn.uid) ; 


43 


printf {"T3ie OOJP ID = 96clNn", 


44 


buf->slim_pem.gid) ; 


45 


printf ("The creator's ID = ?fid\ii", 


46 


l3u£->shin_pem.cuid) ; 


47 


printf ("The creator's gtoap ID = %d\n", 


48 


t)uf->shm pem.cgid); 


49 


printf ["The qperation permissians = 0%o\n", 


50 


lMf->shm_pem.inode) ; 


51 


printf ( "'The slot usage sequence\n" ) ; 


52 


printf ("nuiEber = 096x:\n", 


53 


t]uf->shn_penn.seq) ; 


54 


printf ("The"key== 096c\n", 


55 


buf->shm pem.key) ; 


56 


printf ("The segment size = %d\n", 


57 


buf— >shm segsz) ; 


58 


printf ("!Bie pid of last shinop = ?6d\n", 


59 


buf->shm_lpid) ; 


60 


printf ("TOte pid of creator = %d\n", 


61 


buf->shm_cpid) ; 


62 


printf {"The current # attached = 9fid\n", 


63 


buf~>shni nattch) ; 


64 


printfCBie in roeniory # attached = 9&a\n", 


65 


tuf->shm_cnattach) ; 


66 


printf("The last shroat time = %d\n", 


67 


bQf->shin_atiiae) ; 


68 


printfC'iaie last shrndt time = %d\n", 


69 


tuf->shm_dtiine) ; 


70 


printf ("The last change time = %d\n", 


71 


huf->shm_ctine) ; 


72 


break; 




/♦ Lines 73 - 87 deleted */ 



88 case 2: ASelect and change the desired 

85 inember(s) of the data structure.*/ 

90 /*Get the original data for this shmid 

91 data structure first.*/ 

92 rtm = shmctKshEnid, IPC_STAT, buf); 
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93 printf ("\nEnter the nuinber for theSn"); 

94 printf ("inanber to be changed:\n") ; 

95 printf ("shm_perm.uid = 1\n"); 

96 printf ("shm'perm.gid = 2Nn"); 

97 printf ("shm_penn.node = 3\n"); 

98 printf ("aitry = "); 

99 scanf("96i", ^choice); 

100 /*Only one choice is allowed per 

101 pass as an illegal entry will 

102 cause repetitive failures until 

103 shmid_ds is iqxiated vdth 

104 IPCSTAT.*/ 

1 05 switch( choice ) { 

106 case 1: 

107 printf ("\nEnter USER ID = 

108 scanf ("961", &aid); 

109 buf->shm_penn.uid = uid; 

110 printf("\nUSER ID = %dNn", 

1 1 1 buf->shm_penn.uid) ; 

112 break; 

113 case 2: 

114 printf ("\nEnter GWXJP ID = 

115 scanf ("9631", &gid); 

116 buf->shm_perm.gid = gid; 

117 printf ("\nGRDOP ID = 5€d\n", 

118 buf-^slin_penn,gid) ; 

119 bireak; 

120 case 3: 

121 printf ("\nEnter MODE = "); 

122 scanf("5fo", fijiode); 

123 bu^shm^pem.mode = mode; 

124 printf ("NnMODE = 0%o\n". 

125 buf^shm_penn.inode) ; 

126 break; 

127 } 

128 ADo the change.*/ 

129 rtm = shrnctKshmid, IPC_SEr, 

130 buf); 

131 break; 
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132 case 3: ARenove the shndd along with its 

133 associated 

134 data structure. ♦/ 

135 rtm = shroctKshmid, IPC Mm), NULL); 

136 break; 

137 case 4: /♦Lode the shared memory segment*/ 

138 rtm = shmctl(shmid, SHM LOCK, NULL); 

139 break; 

140 case 5: /♦Unlock the shared memory 

141 segment.*/ 

142 rtm = shmctl(shmid, SHM_l]NLOCK, NULL); 

143 break; 

144 } 

145 /♦Perform the following if the call is unsuccessful. ♦/ 

146 if (rtm = -1) 

147 { 

148 printf ("Vifflie shmctl system call failed!\n"); 

149 printf ("The errcr number = 3fid\n", ermo); 

150 > 

151 /♦Retum the shmid upon successful ccrapiLetian.*/ 

152 else 

153 printf ("\nShmctl was successful for shmid = 96d\n", 

154 shmid) ; 

155 exit (0); 

156 } 

V ) 

Figure 9-16: shmctl(2) System Call Example 
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Operations for Shared Memory 

This section gives a detailed description of using the shmat(2) and 
shmdt(2) system calls, along with an example program which allows all of 
their capabilities to be exercised. 

Using shmop 

The synopsis found in the shmop(2) entry in the Programmer's Reference 
Manual is as follows: 



#i3iclude <sys/types.h> 
#incl\2de <sys/ipc.h> 
#include <s3rs/shm.h> 

char *slinat (shmid, shmaddr, shnflg) 
int shniid; 
char ^shmaddr; 
int shmflg; 

int shmdt (shmaddr) 
char *shneddr; 



Attaching a Shared Memory Segment 

The shmat(2) system call requires three arguments to be passed to it, 
and it returns a character pointer value. 

The system call can be cast to return an integer value. Upon successful 
completion, this value will be the address in core memory where the pro- 
cess is attached to the shared memory segment and when unsuccessful it 
will be a —1. 
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The shmid argument must be a valid, non-negative, integer value. In 
other words, it must have already been created by using the shmget(2) sys- 
tem call. 

The shmaddr argument can be zero or user supplied when passed to the 
shmat(2) system call. If it is zero, the UNIX operating system picks the 
address of where the shared memory segment will be attached. If it is user 
supplied, the address must be a valid address that the UNIX operating sys- 
tem would pick. The following illustrates some typical address ranges; 
these are for the 3B2 Computer: 

OxcOOcOOOO 
OxcOOeOOOO 
OxcOlOOOOO 
0xc0120000 

Note that these addresses are in chunks of 20,000 hexadecimal. It would 
be wise to let the operating system pick addresses so as to improve portabil- 
ity. 

The shmflg argument is used to pass the SHM_RND and 
SHM^RDONLY flags to the shmat() system call. 

Further details are discussed in the example program for shmopO. If 
you have problems understanding the logic manipulations in this program, 
read the "Using shmget" section of this chapter; it goes into more detail 
than what would be practical to do for every system call. 
Detaching Shared Memory Segments 

The shmdt(2) system call requires one argument to be passed to it, and 
shmdt(2) returns an integer value. 

Upon successful completion, zero is returned; and when unsuccessful, 
shmdt(2) returns a -1. 

Further details of this system call are discussed in the example program. 
If you have problems understanding the logic manipulations in this pro- 
gram, read the "Using shmget" section of this chapter; it goes into more 
detail than what would be practical to do for every system call. 

Example Program 

The example program in this section (Figure 9-17) is a menu driven pro- 
gram which allows all possible combinations of using the shmat(2} and 
shmdt(2) system calls to be exercised. 
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From studying this program, you can observe the method of passing 
arguments and receiving return values. The user-written program require- 
ments are pointed out. 

This program begins (lines 5-9) by including the required header files as 
specified by the shmop(2) entry in the Programmer's Reference Manual Note 
that in this program that errno is declared as an external variable, and 
therefore, the errnch header file does not have to be included. 

Variable and structure names have been chosen to be as close as possible 
to those in the synopsis. Their declarations are self-explanatory. These 
names make the program more readable, and this is perfectly legal since 
they are local to the program. The variables declared for this program and 
their purposes are as follows: 

■ flags— used to store the codes of SHM_RND or SHM_RDONLY for 
the shmat(2) system call 

■ addr— used to store the address of the shared memory segment for 
the shmat(2) and shmdt(2) system calls 

■ i— used as a loop counter for attaching and detaching 

■ attach— used to store the desired number of attach operations 

■ shmid— used to store and pass the desired shared memory segment 
identifier 

■ shmflg— used to pass the value of flags to the shmat(2) system call 

■ retrn— used to store the return values from both system calls 

■ detach— used to store the desired number of detach operations 

This example program combines both the shinat(2) and shmdt(2) system 
calls. The program prompts for the number of attachments and enters a 
loop until they are done for the specified shared memory identifiers. Then, 
the program prompts for the number of detachments to be performed and 
enters a loop until they are done for the specified shared memory segment 
addresses. 

shmat 

The program prompts for the number of attachments to be performed, 
and the value is stored at the address of the attach variable (lines 17-21). 
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A loop is entered using the attach variable and the i counter (lines 23- 
70) to perform the specified number of attachments. 

In this loop, the program prompts for a shared memory segment 
identifier (lines 24-27) and it is stored at the address of the shmid variable 
(line 28). Next the program prompts for the address where the segment is 
to be attached (lines 30-34), and it is stored at the address of the addr vari- 
able (line 35). Then, the program prompts for the desired flags to be used 
for the attachment (lines 37-44), and the code representing the flags is 
stored at the address of the flags variable (line 45). The flags variable is 
tested to determine the code to be stored for the shmflg variable used to 
pass them to the shmat(2) system call (lines 46-57). The system call is exe- 
cuted (line 60). If successful, a message stating so is displayed along with 
the attach address (lines 66-68). If unsuccessful, a message stating so is 
displayed and the error code is displayed (lines 62, 63). The loop then con- 
tinues until it finishes. 

shmdt 

After the attach loop completes, the program prompts for the number of 
detach operations to be performed (lines 71-75), and the value is stored at 
the address of the detach variable (line 76). 

A loop is entered using the detach variable and the i counter (lines 78- 
95) to perform the specified number of detachments. 

In this loop, the program prompts for the address of the shared memory 
segment to be detached (lines 79-83), and it is stored at the address of the 
addr variable (line 84). Then, the shmdt(2) system call is performed (line 
87). If successful, a message stating so is displayed along with the address 
that the segment was detached from (lines 92,93). If unsuccessful, the error 
number is displayed (line 89). The loop continues until it finishes. 

The example program for the shmop(2) system calls follows. It is sug- 
gested that the program be put into a source file called shmop.c and then 
into an executable file called shmop. 



1 AThis is a program to illustrate 

2 **the shared memory operations, slaiDp( ) , 

3 **system call capabilities. 

4 ♦/ 



/♦Include necessary header files.*/ 
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6 #ijiclude <stclio.h> 

7 #aiiclude <sys/types.h> 

8 #include <sys/ipc.h> 

9 #iiK:lude <sys/shm.h> 

10 /*Start of main C language program*/ 

11 mainO 

12 { 

13 extern int ermo; 

14 int flags, addr, i, attach; 

15 int shmid, shanflg, retm, detach; 



16 ALoop for attachments by this process.*/ 

17 printf ( "Enter the number ofNn" ) ; 

18 priatf ("attachments for thisNn"); 

19 printf ("process (1-4).Nn"); 

20 printf (" Attachments = "); 

21 scanf("5^d", Sattach) ; 

22 printf ("Nuniser of attaches = 9«d\n", attach); 

23 for(i = 1; i <= attach; i++) { 

24 /*Enter the shared memory ID.*/ 

25 printf ("NnEnter the shmid of\n"}; 

26 printf ("the shared memory segment to\n"); 

27 printf ("be operated on = "); 

28 scanf("%d", &shmid); 

29 printf ("\nshmid = 56a\n", shmid); 



30 /*Eiiter the value for shmaddr.*/ 

31 printf ("\nEnter the value far\n"); 

32 printf ( "the shared roenory addressNn" ) ; 

33 prijitf ( "in hexadecimal : \n" ) ; 

34 printf(" Shmaddr = "); 

35 scanf("%x", Saddr); 

36 printf ("1316 desired address = Ox9fix\n", addr) ; 

37 /*Specify the desired flags.*/ 

38 printf ( "\nEnter the oarrespcndingNn" ) ; 

39 printf ("nurober for the desired\n"); 

40 printf ( "f lags :\n"); 
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41 print£("SHM_RND = 1\n"); 

42 printf("SHM_RDONLY = 2\n"); 

43 piriiitfC'SHM^RND and SHM_KDONLY = 3\n"); 

44 printfC ' Flags = "); 

45 scanf("9^cl", &flags); 

46 switch( flags) 

47 { 

48 case 1: 

49 shmflg = SHM_RND; 

50 bireak; 

51 case 2: 

52 shmflg = SHM_RDCNLY; 

53 break; 

54 case 3: 

55 shmflg = SHM RND I SHM^RDCNLY; 

56 break; 

57 } 

58 printf ("\nFlags = 09fo\n", shmflg); 

59 /*Do the shmat system call.*/ 

60 retm = (int)shnQt{shmid, addr, shmflg); 

61 if (retm == -1) { 

62 printf ( "\nShmat failed. " ) ; 

63 printf ("Error = %d\n", ermo) ; 

64 } 

65 else { 

66 printf ( "\nShnat was successfulNn" ) ; 

67 printf ("foo: shndd = 96dSn", shmid); 

68 printf ("The address = Ox56c\n", retm); 

69 } 

70 } 

71 ALoop for detachments by this process.*/ 

72 printf ("Biter the nuntoer of\n"); 

73 printf ( "detachments for this\n" ) ; 

74 printf ("process (1-4).\n"); 

75 printf (" Detachments = "); 

76 scanf("56d", &detach); 

77 printf ("NuBCiber of attaches = %d\n", detach); 

78 for{i = 1; i <= detach; i++) { 
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79 /♦Enter the value for shnaddr.*/ 

80 printf {"NnEnter the value for\n"); 

81 printfC'the shared menory addressNn"); 

82 printf ( " in hexadecimal : \n" ) ; 

83 printf (" Shmaddr = 

84 scaiif("%)c", &addr); 

85 printf ("OSie desired address = Ox?6x\n", addr); 

86 /*Do the shrtdt system call.*/ 

87 retm = (int)shndt(addr); 

88 if (retm == -1) { 

89 printf ("Ermr = 5^d\n'\ ermo); 

90 > 

91 else { 

92 printf ("NnShmdt was successf ul\n" ) ; 

93 printf ("for address = 096x:\n"» addr); 



94 } 

95 } 

96 } 




Figure 9-17: shmopO System Call Example 
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Screen management programs are a common component of many com- 
mercial computer applications. These programs handle input and output at 
a video display terminal. A screen program might move a cursor, print a 
menu, divide a terminal screen into windows, change the definitions of 
colors, or draw a display on the screen to help users enter and retrieve 
information from a database. 

This chapter explains how to use the curses library, terminfo database, 
and the terminfo support routines to write terminal-independent screen 
management programs on a UNIX system. It also contains information on 
other UNIX system tools that support the curses/terminfo screen manage- 
ment system. 

The purpose of this chapter is to explain how to write screen manage- 
ment programs as quickly as possible. Therefore, it does not attempt to 
cover every detail. Use this chapter to get familiar with the way these rou- 
tines work, then use the Programmer's Reference Manual for more informa- 
tion. 

Before attempting to use curses/terminfo, or to understand this docu- 
ment, you should be familiar with the following: 

■ The C programming language 

■ The UNIX system /C language standard I/O package stdio(3S) 

■ "System Calls and Subroutines" and "Input /Output" in Chapter 2 of 
the Programmer's Guide 

Organization of this Chapter 

This chapter has five sections: 

■ Overview 

This section briefly describes curses, terminfo, and other related 
UNIX system support tools. 

■ Working with curses Routines 

This section contains a brief description of the routines that make up 
the curses(3X) library. These routines write to a screen, read from a 
screen, build windows, draw line graphics, use a terminal's soft 
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labels, manipulate colors, and work with more than one terminal at 
the same time. Examples are included. 

■ Working with terminf o Routines 

This section describes a subset of routines in the curses library. 
These routines access and manipulate data in the terminfo database. 
They are used to set up and handle special terminal capabilities such 
as programmable function keys. 

■ Working with the terminfo Database 

This section describes the terminfo database, related support tools, 
and their relationship to the curses library. 

■ curses Program Examples 

This section includes programs that illustrate uses of curses routines. 
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What is curses? 

curses(3X) is the library of routines that you use to write screen 
management programs on the UNIX system. The routines are C functions 
and macros, with an argument syntax modeled after routines in the stan- 
dard C library. For example, the routine printw( ) uses the same arguments 
as printf (3S). The only difference is that printw( ) sends its output to stdscr 
(defined in <curses*h>) instead of stdout. The routine getch() and the 
standard getc(3S) are related in the same manner. The automatic teller pro- 
gram at your bank might use printw( ) to print its menus and getch( ) to 
accept your request for a withdrawal. 

The curses routines are usually located in /usr/lib/libcurses.a. To com- 
pile a program using these routines, you must use the cc(l) command and 
include -Icurses on the command line to tell the link editor to locate, load, 
and link them: 

cc file.c —Icurses — o file 

The name curses comes from the cursor optimization that this library of 
routines provides. Cursor optimization minimizes the amount a cursor has 
to move around a screen to update it. For example, if you designed a screen 
editor program with curses routines and edited the sentence 

cwrses/terininfo is a great package for creating screens, 

to read 

curses/terminfo is the best package for creating screens. 

the program would not output the whole line, but only the part from the 
best on. The other characters would be preserved. Because the amount of 
data transmitted — the output — is minimized, cursor optimization is also 
referred to as output optimization. 

Cursor optimization takes care of updating the screen in a manner 
appropriate for the terminal on which a curses program is run. This means 
that the curses library can do whatever is required to update many different 
terminal types. It searches the terminfo database (described below) to find 
the correct information about a terminal. 
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How does cursor optimization help you and those who use your pro- 
grams? First, it saves you programming time when describing how you 
want to update a screen. Second, it saves a user's time when the screen is 
updated. Third, it reduces the load on your UNIX system's communication 
lines when the updating takes place. Fourth, you don't have to worry about 
the myriad of terminals on which your program might be run. 

Here's a simple curses program. It uses the mandatory # include 
<curses.h>, initscrO, and endwinO routines, along with some of the basic 
curses routines, to move a cursor to the middle of any terminal screen and 
print the character string BullsEye. Each of the curses routines is described 
in the following section, "Working with curses Routines". LINES and COLS 
are variables declared in <curses.h>. 



#xnclude <curses.h> 

inain( ) 
{ 

initscr( ); 

incwe{ LINES/2 - 1, OOLS/2 - 4 ); 

addstr{ "Balls"); 

refresh( ); 

addstrC'Eye"); 

refresh( ); 

endMiii( ); 



Figure 10-1: A Simple curses Program 
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What is terminfo? 

terminfo refers to both of the following: 

■ It is a group of routines within the curses library that handles certain 
terminal capabilities. You can use these terminfo routines to write a 
filter or program the function keys, if your terminal has programm- 
able keys. Shell programmers can use the command tput(l) to per- 
form many of the manipulations provided by these routines. 

■ It is a database containing the descriptions of many terminals that 
can be used with curses programs. These descriptions specify the 
capabilities of a terminal and the way it performs various 
operations — for example, how many lines and columns it has and 
how its control characters are interpreted. 

Each terminal description in the database is a separate, compiled file. 
You use the source code that terminfo(4) describes to create these 
files and the command tic(lM) to compile them. 

The compiled files are normally located in the directories 
/usr/lib/terminfo/?. These directories have single character names, 
each of which is the first character in the name of a terminal. For 
example, an entry for the AT&T Teletype 5425 is normally located in 
the file /usr/lib/terminfo/a/att5425. 

Here's a simple shell script that uses the terminfo database. 
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# Clear the scxeen and show the 0,0 jxjsition. 
tput clear 

tput o:^ # or tput home 

echo this is 0" 

# Show line 5, ooluran 10. 

tput cup 5 10 

echo this is 5 10" 



Figure 10-2: A Shell Script Using terminfo Routines 



How curses and terminfo Work Together 

A screen management program linked with curses routines uses infor- 
mation from the terminfo database at run time. The information tells 
curses what it needs to know about the terminal being used— what we'll 
call the current terminal from here on. 

For example, suppose you are using an AT&T Teletype 5425 terminal to 
run the simple curses program shown in Figure 10-1. To put the BullsEj^ 
in the middle of the screen, the program needs to know how many lines 
and columns the terminal can display. The description of the AT&T Tele- 
type 5425 in the terminfo database has this information. All the curses pro- 
gram needs to know before it goes looking for the information is the name 
of your terminal. 

The initscr( ) routine at the beginning of a curses program calls the ter- 
minfo routine setupterm( ). If setupterm( ) is not told which terminal to set 
up for, it looks at the shell environment variable TERM, which is usually 
set and exported by your .profile when you log in (see profile(4)). Know- 
ing the value of TERM, it then finds and opens the correct terminal 
description entry in the terminfo database. Then, when your program calls 
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a curses routine, information that the routine needs concerning the 
terminal's capabilities is available. 

Assume that the following example lines are in a .profile: 

TERM=5425 
export TEaRM 
tput init 

There can be other statements between them, but these three statements 
must appear in this order. The first line stores the terminal name in the 
shell environment variable TERM. The second line exports the variable. 
The third line is a terminfo routine that initializes the current terminal. 
That is, it makes sure that the terminal is set up according to its description 
in the terminfo database. If you had these lines in your .profile and you 
ran a curses program, the program would get the information that it needs 
about your terminal from the file /usr /lib /terminfo /a /att5425. 

Other Components of the Screen Management 
System 

You have been given a brief look at the main components of the 
curses /terminfo method of screen management. This section will complete 
the overview by making you familiar with the other components of this sys- 
tem. 
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Component 


Brief Description 


terminfo 

database 


Files found under /usr /lib/ terminfo/?/*; these files 
contain compiled terminal descriptions. ? is the first 
letter of the terminal name, and * is the terminal 
name. 


tic(lM) 


terminfo(4) defines terminal description source files, 
tic compiles them into terminfo database files. 


infocmp(lM) 


A routine that prints and compares compiled terminfo 
description files. 


captoinfo(lM) 


A routine that converts old termcap files to terminfo 
database files. 


terminfo(4) 


Defines both the terminfo database files and the rou- 
tines used to manipulate and instantiate the strings of 
data in those files. 


tput(l) 


A terminfo routine that causes a string from the ter- 
minfo database to be sent to the terminal, thus setting 
one or more parameters. 


curses(3X) 


A library of C routines that uses information in the 
terminfo database. The routines are terminal 
independent. They optimize cursor movement and 
allow for the easy programming of screen handling 
code. 


Other manual 
pages to read 


layers(l), stdio(3S), profile(4), scr_dump(4), term(4), 
term(5) 



Figure 10-3: Components of the curses/ terminfo Screen Management Sys- 
tem 



The terminfo database has already been described as one of the main 
components of the curses/ terminfo screen management system. The rules 
for creating a terminal description source file are in the manual page ter- 
minfo(4). The source file is then compiled using tic. Unless you have 
created a shell environment variable called TERMINFO that indicates a 
different path, tic will place the compiled description file into the proper 
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directory under /usr/lib/terminfo (provided that you have permission to 
create or overwrite files in that directory). To use tic simply type: 

tic filejiame 

You may use the -v option to get a running commentary. An integer from 
1 to 10 may follow the option (no space) to set the level of verbosity. The 
default is 1. 

The system uses the shell environment variable TERMINFO to find the 
terminal description files. Initializing a terminal will cause TERMINFO to 
be set to /usr/lib/terminfo unless you have already set it to some other 
path ($HOME/bin, for example). The system will look for the definition of 
a specific terminal under $TERMINFO/?/*, where ? is the first letter of the 
terminal name, and ♦ is the terminal name. 

Once a terminal description file has been compiled, it is no longer 
human readable. The routine inf ocmp translates a compiled description file 
back to source statements. Invoking the command without arguments will 
print out the description of the terminal defined by the shell environment 
variable TERM. A single argument is taken as the name of a terminal you 
want to see the source description for. With no options declared (or —I), 
you will see descriptions as defined in terminfo(4). There are options for 
seeing the C variable names (-L), the old termcap names (-C), and all out- 
put in termcap format (— r). 

If two arguments are given, infocmp assumes they identify two descrip- 
tions you want to compare. If no options are given (or -d), the differences 
are printed. You may also ask for a list of capabilities that the two have in 
common (— c) or a list of capabilities that neither describes (— n). In all of 
the above cases, the output lists the boolean fields first, the numeric fields 
second, and the strings third. 

infocmp also has options to print, trace, sort, compare files in two 
different directories, and output a source file derived from the union of two 
or more compiled description files. For more information consult your Sys- 
tem Administrator's Reference Manual. 

Early versions of the UNIX system used a different method of describ- 
ing terminals, called termcap. You can convert a termcap file to a terminfo 
file by using captoinfo. If the command is invoked with no arguments, the 
shell environment variable TERMCAP is used to get the path and the shell 
environment variable TERM to get the terminal. If TERMCAP is null, the 
routine tries to convert /etc/termcap. If a file name is given as an option, 
that is the file that will be converted. The output is to standard out, and 
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may be piped. Options include a trace mode (— v), one field to a line output 
(— 1), and changing the output width (— w). 

One of the definitions given earlier for terminf o was that it is a group 
of routines within curses that allow you to manipulate the data in a termi- 
nal description file. This small library of routines is documented later in 
this chapter and in the curses(3X) manual page. The command tput(l) will 
allow you to perform many of these manipulations from the command line 
or in a shell script. 

tput can always be given the —Tterminaltype option, but doesn't need it 
if the shell environment variable TERM is set. It can be given init, reset, 
or longname as special arguments. These initialize, reset, and print out the 
name of the terminal, respectively. Finally, you can use the name of a ter- 
minfo(4) terminal attribute or capability (called a capname) as an argument. 
These capabilities can fall into three categories; boolean, numeric, and 
strings. If the capname you specify is a string, you may include, as an argu- 
ment, a list of parameters to insert into coded places in the string (instantia- 
tion). In Figure 10-2 cup is the capname of the string the computer sends to 
your terminal to move the cursor to a particular position, tput calls a rou- 
tine that instantiates the 5 and 10 into the proper places in that string, and 
then sends the string to your terminal, thus moving the cursor to that posi- 
tion. 

This completes the overview of the curses/ terminf o screen management 
system. A more detailed description of each component starts in the next 
section. If you elect to skip this and go directly to the manual pages, 
remember that the examples at the end of the chapter might still prove use- 
ful. 
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This section describes the basic curses routines for creating interactive 
screen management programs. It begins by describing the routines and 
other program components that every curses program needs to work prop- 
erly. Then it tells you how to compile and run a curses program. Finally, 
it describes the most frequently used curses routines that 

■ write output to and read input from a terminal screen 

■ control the data output and input — for example, to print output in 
bold type or prevent it from echoing (printing back on a screen) 

■ manipulate colors on alphanumeric terminals 

■ manipulate multiple screen images (windows) 

■ draw simple graphics 

■ manipulate soft labels on a terminal screen 

■ send output to and accept input from more than one terminal. 

To illustrate the effect of using these routines, we include simple exam- 
ple programs as the routines are introduced. We also refer to a group of 
larger examples located in the section "curses Program Examples" in this 
chapter. These larger examples are more challenging; they sometimes make 
use of routines not discussed here. Keep the curses(3X) manual page 
handy. 



What Every curses Program Needs 

Every curses program needs to include the header file <curses.h> and 
call the routines initscr( ), ref resh( ) or similar related routines, and 
endwin( ). 

The Header File <curses.h> 

The header file <curses.h> defines several global variables and data 
structures and defines several curses routines as macros. 
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To begin, let's consider the variables and data structures defined. 
<curses.h> defines all the parameters used by curses routines. It also 
defines the integer variables LINES and COLS; when a curses program is 
run on a particular terminal, these variables are assigned the vertical and 
horizontal dimensions of the terminal screen, respectively, by the routine 
initscr( ) described below. The header file defines the constants OK and 
ERR, too. The integer variables COLORS and COLOR_PAIRS are also 
defined in <curses,h>. These will be assigned, respectively, the maximum 
number of colors and color-pairs the terminal can support. These variables 
are initialized by the start jcolor( ) routine. (See the section "Color Manipu- 
lation.") Most curses routines have return values; the OK value is returned 
if a routine is properly completed, and the ERR value if some error occurs. 



NOTE 



LINES and COLS are external (global) C variables that represent the size 
of a terminal screen. Two shell environment variables, LINES and 
COLUMNS, may be set in a user's shell environment; a curses program 
uses the shell environment variables, if they exist, as the default values 
for the C variables. If they do not exist, the C variables are set by reading 
the terminfo database. To avoid confusion, for the remainder of this 
document we will use the $ to distinguish environment variables from 
the C variables. 



Now let's consider the macro definitions. <curses.h> defines many 
curses routines as macros that call other macros or curses routines. For 
instance, the simple routine refresh( ) is a macro. The line 

#define refresh( ) wrefresh(stdscr) 

from <curses.h> shows that when refresh is called, it is expanded to call 
the curses routine wref resh( ). The latter routine in turn calls the two 
curses routines wnoutref resh( ) and doupdate( ). Many other routines also 
group two or three routines together to achieve a particular result. 

V Macro expansion in curses programs may cause problems with certain 
sophisticated C features, such as the use of automatic incrementing vari- 
ables. 
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One final point about <curses.h>: it automatically includes <stdio.h> 
and the <termio.h> tty driver interface file. Including either file again in 
a program is harmless but wasteful. 

The Routines initscr(), refresh(), endwin() 

The routines initscr( ), ref resh( ), and endwin( ) initialize a terminal 
screen to an "in curses state/' update the contents of the screen, and restore 
the terminal to an "out of curses state/' respectively. Use the simple pro- 
gram that we introduced earlier to learn about each of these routines: 




rnaiiiC ) 
{ 

initscr( ); /* initialize tenrdnal settings and <curses.h> 
data structures and variables -»/ 

nove( IJNES/2 - 1, CX>LS/2 - 4 ); 
addstr{ "Bulls"); 

refresih( ); /* send outixit to (v^xilate) terminal screen */ 
addstrC'Eye"); 

ref resh( ) ; /♦ send more oul^wt to terminal screen */ 
endwin( ); /♦ restore all terminal settings #/ 




Figure 10-4: The Purposes of initscr( ), refresh( ), and endwin( ) in a Pro- 
gram 



A curses program usually starts by calling initscr( ); the program should 
call initscr( ) only once. Using the shell environment variable $TERM as 
the section "How curses and terminfo Work Together" describes, this rou- 
tine determines what terminal is being used. It then initializes all the 
declared data structures and other variables from <curses.h>. For example, 
initscr( ) would initialize LINES and COLS for the sample program on 
whatever terminal it was run. If the Teletype 5425 were used, this routine 



curses/terminfo 10-13 



Working with curses Routines 



would initialize LINES to 24 and COLS to 80. Finally, this routine writes 
error messages to stderr and exits if errors occur. 

During the execution of the program, output and input is handled by 
routines like move( ) and addstr( ) in the sample program. For example, 

nove( LINES/2 - 1, QOlS/l - 4 ); 

says to move the cursor to the left of the middle of the screen. Then the 
line 

addstr( "Bulls"); 

says to write the character string Bolls. For example, if the Teletype 5425 
were used, these routines would position the cursor and write the character 
string at (11,36). 



NOTE 



All curses routines that move the cursor move it from its home position 
in the upper left corner of a screen. The (LINES,COLS) coordinate at this 
position is (0,0) not (1,1). Notice that the vertical coordinate is given first 
and the horizontal second, which is the opposite of the more common 
(x,y) order of screen (or graph) coordinates. The -1 in the sample pro- 
gram takes the (0,0) position into account to place the cursor on the 
center line of the terminal screen. 



Routines like move( ) and addstr( ) do not actually change a physical ter- 
minal screen when they are called. The screen is updated only when 
ref resh( ) is called. Before this, an internal representation of the screen 
called a screen buffer or window is updated. This is a very important con- 
cept, which we discuss below under "More about ref resh( ) and Windows." 

Finally, a curses program ends by calling endwin( ). This routine 
restores all terminal settings and positions the cursor at the lower left 
corner of the screen. 



Compiling a curses Program 

You compile programs that include curses routines as C language pro- 
grams using the cc(l) command (documented in the Programmers Reference 
Manual), which invokes the C compiler (see Chapter 2 in this guide for 
details). 
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The routines are usually stored in the library /usr/lib/libcurses.a. To 
direct the link editor to search this library, you must use the -1 option with 
the cc command. 

The general command line for compiling a curses program follows: 

cc file.c — Icurses — o file 

file.c is the name of the source program; and file is the executable object 
module. 



Running a curses Program 

curses programs count on certain information being in a user's environ- 
ment to run properly. Specifically, users of a curses program should usually 
include the following three lines in their .profile files: 

TEBfL=current terminal type 
export TERM 
tput init 

For an explanation of these lines, see the section "How curses and ter- 
minf o Work Together" in this chapter. Users of a curses program could also 
define the environment variables $LINES, $COIUMNS, and $TERMINFO 
in their .profile files. However, unlike $TERM, these variables do not have 
to be defined. 

If a curses program does not run as expected, you might want to debug 
it. Keep a few points in mind. 

First, a curses program is interactive and should always have knowledge 
of where the cursor is located. It is suggested that you do not use an 
interactive debugger as they usually cause changes to the contents of the 
screen without telling curses about it. 

Second, a curses program outputs to a screen buffer until ref resh( ) or a 
similar routine is called. Because visual output from the program is not 
immediate, debugging may be confusing. 

Third, you cannot set break points on curses routines that are macros, 
such as ref resh( ). You have to set the breakpoint on routines used to define 
these macros. For example, you have to use wrefresh( ) instead of ref resh( ). 
See the above section, "The Header File <curses.h>," for more information 
about macros. 



curses/terminfo 10-15 



Working with curses Routines 



More about refresh( ) and Windows 

As mentioned above, curses routines do not update the terminal screen 
until ref resh( ) is called. Instead, they write to an internal representation of 
the screen called a screen buffer or window. When ref resh( ) is called, all 
the accumulated output is sent from the window to the current terminal 
screen. 

<curses,h> supplies a default window called standard screen. The C 
name for this is stdscr. It is the size of the current terminal's screen. 
<curses.h> defines stdscr to be of the type WINDOW*,* a pointer to a C 
structure which is a two-dimensional array of characters representing the 
terminal screen. The program always keeps track of what is on the physical 
(terminal) screen, as well as what is in stdscr. When ref resh( ) is called, it 
compares the two screen images and sends a stream of characters to the ter- 
minal that make the physical screen look like stdscr. A curses program con- 
siders many different ways to do this, taking into account the various capa- 
bilities of the terminal and similarities between what is on the screen and 
what is in the window. It optimizes output by printing as few characters as 
possible. Figure 10-5 illustrates what happens when you execute the sample 
curses program given in Figure 10-1. Notice in the figure that the terminal, 
screen retains whatever garbage is on it until the first ref resh( ) is called. 
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initscrO 



stdscr 



physical screen 




(garbage) 



move(LINES/2-l, 
COLS/ 1-4) 
[2,3] 



addstr ("Bulls") 



stdscr 



Bulls □ 



physical screen 




(garbage) 



stdscr physical screen 



(garbage) 



refreshO 



stdscr physical screen 




Bulls □ 



Figure 10-5: The Relationship between stdscr and a Terminal Screen 
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stdscr physical screen 



addstr ("Eye") 



refreshO 



endwinO 



BullsEye □ 




Bulls □ 


stdscr 


physical screen 


BullsEye □ 




BullsEye □ 


stdscr 


physical screen 


BullsEye □ 




BullsEye 

□ 



Figure 10-5: The Relationship Between stdscr and a Terminal Screen (con- 
tinued) 



You can create other windows and use them instead of stdscr. Win- 
dows are useful for maintaining several different screen images. For exam- 
ple, many data entry and retrieval applications use two windows: one to 
control input and output and one to print error messages that don't mess up 
the other window. 

It's possible to subdivide a screen into many windows, refreshing each 
one of them as desired. When windows overlap, the contents of the current 
screen show the most recently refreshed window. It's also possible to create 
a window within a window; the smaller window is called a subwindow. 
Assume that you are designing an application that uses forms, for example 
an expense voucher, as a user interface. You could use subwindows to con- 
trol access to certain fields on the form. 
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Some curses routines are designed to work with a special type of win- 
dow called a pad. A pad is a window whose size is not restricted by the 
size of a screen or associated with a particular part of a screen. You can use 
a pad when you have a particularly large window or only need part of the 
window on the screen at any one time. For example, you might use a pad 
for an application with a spread sheet. 

Figure 10-6 represents what a pad, a sub window, and some other win- 
dows could look like in comparison to a terminal screen. 




Figure 10-6: Multiple Windows and Pads Mapped to a Terminal Screen 



The section "Building Windows and Pads" in this chapter describes the 
routines you use to create and use them. If you'd like to see a curses pro- 
gram with windows now, you can turn to the window program under the 
section "curses Program Examples" in this chapter. 
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Getting Simple Output and Input 

Output 

The routines that curses provides for writing to stdscr are similar to 
those provided by the stdio(3S) library for writing to a file. They let you 

■ write a character at a time — addch( ) 

■ write a string — addstr( ) 

■ format a string from a variety of input arguments — printw( ) 

■ move a cursor or move a cursor and print character(s) — move( ), 
mvaddch( ), mvaddstr( ), mvprintw( ) 

■ clear a screen or a part of it — clear( ), erase( ), clrtoeol( ), clrtobot( ) 
Following are descriptions and examples of these routines. 

The curses library provides its own set of output and input functions. 
You should not use other I/O routines or system calls, like read(2) and 
write(2), in a curses program. They may cause undesirable results when 
you run the program. 
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addchO 

SYNOPSIS 

#include <curses.h> 

int addch(ch) 
chtype ch; 

NOTES 

■ addch( ) writes a single character to stdscr. 

■ The character is of the type chtype, which is defined in <curses.h>. 
chtype contains data and attributes (see "Output Attributes" in this 
chapter for information about attributes). 

■ When working with variables of this type, make sure you declare 
them as chtype and not as the basic type (for example, short) that 
chtype is declared to be in <curses.h>. This will ensure future com- 
patibility. 

■ addch( ) does some translations. For example, it converts 

□ the <NL> character to a clear to end of line and a move to the 
next line 

□ the tab character to an appropriate number of blanks 

□ other control characters to their "X notation 

■ addch( ) normally returns OK. The only time addch( ) returns ERR is 
after adding a character to the lower right-hand corner of a window 
that does not scroll. 

■ addch( ) is a macro. 
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EXAMPLE 

#include <curses.h> 

itain( ) 
{ 

iiiLtscr( ); 
addch( 'a' ) ; 
refresh( ); 
enctwdnC ); 

} 

The output from this program will appear as follows: 




Also see the show program under "curses Example Programs" in this 
chapter. 
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addstrO 

SYNOPSIS 

#include <curses.h> 

int addstr(str) 
char ♦str; 

NOTES 

■ addstr( ) writes a string of characters to stdscr. 

■ addstr( ) calls addch( ) to write each character. 

■ addstr( ) follows the same translation rules as addch( ). 

■ addstr( ) returns OK on success and ERR on error. 

■ addstr( ) is a macro. 

EXAMPLE 

Recall the sample program that prints the character string BullsEye. 
See Figures 10-1, 10-2, and 10-5. 
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printw( ) 

SYNOPSIS 

#include <curses.h> 

int printw(f mt [, arg. . • ] ) 
char ♦ f mt 

NOTES 

■ printw( ) handles formatted printing on stdscr. 

■ Like printf , printw( ) takes a format string and a variable number of 
arguments, 

■ Like addstr( ), printw( ) calls addch( ) to write the string. 

■ printw( ) returns OK on success and ERR on error. 
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EXAMPLE 

#include <curses.h> 

min( ) 
{ 

char* title = "Not specified"; 
int no = 0; 

A Missing code. 

*/ 

initscr( ); 

/* Missing code, 

*/ 

printw("5fe is not in stock. \n", title); 

printw( "Please ask the cashier to order %d for yoa.Nn", no) ; 

refresh( ); 
endwinC ); 

} 

The output from this program will appear as follows: 




Please ask the cashier to order for you. 



$□ 
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move( ) 
SYNOPSIS 

#include <curses.h> 

int move(y^ x); . 
int y, x; 



NOTES 



move( ) positions the cursor for stdscr at the given row y and the 
given column x. 

Notice that inove( ) takes the y coordinate before the x coordinate. 
The upper left-hand coordinates for stdscr are (0,0), the lower right- 
hand (LINES - 1, COLS - 1). See the section "The Routines initscr(), 
ref resh( ), and endwin( )" for more information. 

move( ) may be combined with the write functions to form 

□ mvaddch( y, x^ ch ), which moves to a given position and prints a 
character 

□ mvaddstr( y, x, str ), which moves to a given position and prints 
a string of characters 

□ mvprintw( y, x, f mt [, arg. . . ] ), 

which moves to a given position and prints a formatted string. 

move( ) returns OK on success and ERR on error. Trying to move to 
a screen position of less than (0,0) or more than (LINES - 1, COLS - 
1) causes an error. 

move( ) is a macro. 
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EXAMPLE 

#include <curses.h> 

mlii( ) 
{ 

initscxC ); 

addstrC "Cursor should be here — > if nove( ) works,"); 
pjrintw( "\n\n\nPress <CR> to end test*"); 
nove(0,25) ; 
refresh( ); 

getGh( ); /* Gets <CR>; discussed below. */ 
endwin( ); 

} 

Here's the output generated by running this program: 



Qjrsor should be here — >nif nove{ ) works. 
Press <CR> to end test. 



After you press <CR>, the screen looks like this: 



Cursor shoRild be here — > 

Press <CR> to end test. 
$□ 



See the scatter program under "curses Program Examples" in this chapter for 
another example of using move( ). 



curses/ terminfo 10-27 



Working with curses Routines 

clear( ) and erase( ) 
SYNOPSIS 

#include <curses,h> 

int clear( ) 
int erase( ) 

NOTES 

■ Both routines change stdscr to all blanks. 

■ clear( ) also assumes that the screen may have garbage that it doesn't 
know about; this routine first calls erase( ) and then clearok( ) which 
clears the physical screen completely on the next call to ref resh( ) for 
stdscr. See the curses(3X) manual page for more information about 
clearok( ). 

■ initscr( ) automatically calls clear( ). 

■ clearO always returns OK; erase() returns no useful value. 

■ Both routines are macros. 
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clrtoeolO and clrtobotO 

SYNOPSIS 

#include <curses.h> 

int clrtoeoK ) 
int clrtobot( ) 

NOTES 

■ clrtoeol( ) changes the remainder of a line to all blanks. 

■ clrtobot( ) changes the remainder of a screen to all blanks. 

■ Both begin at the current cursor position inclusive, 

■ Neither returns any useful value. 
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EXAMPLE 



#include <curses.h> 



roai2i( ) 
{ 

initscx( ); 
noecho{ ); 

ac!dstr( "Press any key to delete frcm here to the end of the line and on."); 

addstr ( "NnDelete this too.NnAnd this."); 

itDve(0,30); 

refresh( ); 

getch( ) ; 

clrtoibot( ); 

refresh( ); 

endwin( ); 

} 

Here's the output generated by running this program: 



Press <CR> to delete frcm here □ to the end of the line and on. 
Delete this too. 
And this. 



Notice the two calls to ref resh( ): one to send the full screen of text to a 
terminal, the other to clear from the position indicated to the bottom of a 
screen. 

Here's what the screen looks like when you press <CR>: 
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Press <CR> to delete from here 
$□ 






See the show and two programs under "curses Example Programs" for 
examples of uses for clrtoeol( ). 



curses routines for reading from the current terminal are similar to 
those provided by the stdio(3S) library for reading from a file. They let you 

■ read a character at a time — getch( ) 

■ read a <NL> -terminated string — getstr() 

■ parse input, converting and assigning selected data to an argument 
list — scanw( ) 

The primary routine is getch( ), which processes a single input character 
and then returns that character. This routine is like the C library routine 
getchar( )(3S) except that it makes several terminal- or system-dependent 
options available that are not possible with getchar( ). For example, you can 
use getch( ) with the curses routine keypad( ), which allows a curses pro- 
gram to interpret extra keys on a user's terminal, such as arrow keys, func- 
tion keys, and other special keys that transmit escape sequences, and treat 
them as just another key. See the descriptions of getch( ) and keypad( ) on 
the curses(3X) manual page for more information about keypad( ). 

The following pages describe and give examples of the basic routines 
for getting input in a screen program. 



Input 
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getchO 
SYNOPSIS 

#include <curses.h> 
int getch( ) 

NOTES 

■ getch( ) reads a single character from the current terminal. 

■ getch( ) returns the value of the character or ERR on 'end of file/ 
receipt of signals, or non-blocking read with no input. 

■ getch( ) is a macro. 

■ See the discussions about echo( ), noecho( ), cbreak( ), nocbreak( ), 
raw( ), noraw( ), half delay( ), nodelay( h and keypad( ) below and in 
curses(3X). 
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EXAMPLE 

#iiicl\ide <c?urses.lx> 

min( ) 
{ 

int ch; 
iiiitscr( ); 

cbreak( ) ; /♦ Escplained later in the section "Inpit Opticans" */ 

addstr( "Press any character: "); 
refresh( ); 
ch = getch( ); 

printw( "Nji\n\nThe character entered was a '%c'.\n", ch); 
refresh( ); 
endwin( ); 



The output from this program follows. The first ref resh( ) sends the 
addstr( ) character string from stdscr to the terminal: 




Then assume that a w is typed at the keyboard. getch( ) accepts the 
character and assigns it to ch. Finally, the second ref resh( ) is called and the 
screen appears as follows: 
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For another example of getch( ), see the show program under "curses 
Example Programs" in this chapter. 
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getstrO 

SYNOPSIS 

#include <curses.h> 

int getstr(str) 
char *str; 

NOTES 

■ getstr( ) reads characters and stores them in a buffer until a <CR>, 
<NL>, or <ENTER> is received from stdscr. getstr() does not 
check for buffer overflow. 

■ The characters read and stored are in a character string. 

■ getstr( ) is a macro; it calls getch( ) to read each character, 

■ getstr( ) returns ERR if getch( ) returns ERR to it. Otherwise it 
returns OK. 

■ See the discussions about echo( ), noecho( ), cbreak( ), nocbreak( ), 
raw( ), noraw( ), half delay( nodelay( ), and keypad( ) below and in 
curses(3X). 
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EXAMPLE 

#inclvide <curses,h> 

min( ) 
{ 

char str[256]; 
initscr( ); 

cbreak( ) ; /* Ejqjladned later in the section "Input Opticins" */ 

addstr( "Enter a ciharacter string terminated by <CR>:\n\n"); 
refresh( ) 
getstr(str) ; 

printw( "\n\n\nllhe string entered was \n'%s'\n", str); 
refresh( ); 
endwin( ); 

} 

Assume you entered the string 'I enjoy learning about the UNIX sys- 
tem/ The final screen (after entering <CR>) would appear as follows: 



Enter a character string terndnated Isy <CS>: 
I enjoy leamingr about the UNIX system. 



The string entered was 

'l enjcy learning abcut the UNIX system. ' 



$□ 
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scanw( ) 
SYNOPSIS 

#include <curses.h> 

int scanw(f mt arg.«»]) 
char *fmt; 

NOTES 

■ scanw( ) calls getstr( ) and parses an input line. 

■ Like scanf (3S), scanw( ) uses a format string to convert and assign to 
a variable number of arguments. 

■ scanw( ) returns the same values as scanf ( ). 

■ See scanf(3S) for more information. 
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EXAMPLE 



#iiiclude <curses.h> 



icain( ) 
{ 

chsa: string[100]; 
float number; 



initscr( ); 

cbreak( ); /* Explained later in the */ 

ecihD( ) ; /* section "Ii^t Opticans" */ 

addstr( "Enter a number and a string separated ty a cotita: "); 

refresh( ); 

scanw( "%f ,%s" ,&niDODtber , string) ; 
clear( ); 

printw("The string vas V'%sV' and the number was %f . ",striiig,niimber) ; 
refresh( ); 
endwin( ); 



Notice the two calls to ref resh( ). The first call updates the screen with 
the character string passed to addstr( ), the second with the string returned 
from scanw( ). Also notice the call to clear( ). Assume you entered the fol- 
lowing when prompted: 2, twin. After running this program, your termi- 
nal screen would appear, as follows: 



The Strang was "tvdn" and the nurrber was 2.000000. 



$□ 
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Controlling Output and Input 



Output Attributes 

When we talked about addch( ), we said that it writes a single character 
of the type chtype to stdscr. chtype has two parts: a part with information 
about the character itself and another part with information about a set of 
attributes associated with the character. The attributes allow a character to 
be printed in reverse video, in bold, in a particular color, underlined, and 
so on. 

stdscr always has a set of current attributes that it associates with each 
character as it is written. However, using the routine attrset( ) and related 
curses routines described below, you can change the current attributes. 
Below is a list of the attributes and what they mean: 

■ A_BLINK — blinking 

■ A_BOLD — extra bright or bold 

■ A_DIM — half bright 

■ A_REVERSE — reverse video 

■ A STANDOUT — a terminal's best highlighting mode 

■ A_UNDERLINE — underlining 

■ A_ALTCHARSET — alternate character set (see the section "Drawing 
Lines and Other Graphics" in this chapter) 

■ COLOR_PAIR(«) — change foreground and background colors (see 
the section on "Color Manipulation" in this chapter) 

To use these attributes, you must pass them as arguments to attrset( ) and 
related routines; they can also be ORed with the bitwise OR (| ) to addch( ). 



NOTE 



Not all terminals are capable of displaying all attributes. If a particular 
terminal cannot display a requested attribute, a curses program 
attempts to find a substitute attribute. If none is possible, the attribute 
is ignored. 
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Let's consider a use of one of these attributes. To display a word in 
bold, you would use the following code: 




printw( "A wosrd Iji " ) ; 
attrset(A_BOLD) ; 
printw( "boldface"); 
attrset(O); 

printwC really stands ait.Xn"); 




Attributes can be turned on singly, such as attrset(A_BOLD) in the 
example, or in combination. To turn on blinking bold text, for example, 
you would use attrset(A_BLINK| A_BOLD). Individual attributes can be 
turned on and off with the curses routines attron( ) and attroff( ) without 
affecting other attributes. attrset(O) turns all attributes off, including 
changes you may have made to foreground and background color. 

Notice the attribute called A^STANDOUT. You might use it to make 
text attract the attention of a user. The particular hardware attribute used 
for standout is the most visually pleasing attribute a terminal has. Standout 
is typically implemented as reverse video or bold. Many programs don't 
really need a specific attribute, such as bold or reverse video, but instead 
just need to highlight some text. For such applications, the AJSTANDOUT 
attribute is recommended. Two convenient functions, standout( ) and stan- 
dend( ) can be used to turn on and off this attribute. standend( ), in fact, 
turns off all attributes. 

In addition to the attributes listed above, there are three bit masks 
called A_CHARTEXT, A_ATTRIBUTES, and A_COLOR. You can use these 
bit masks with the curses function inch( ) and the C logical AND ( & ) 
operator to extract the character, attributes, or color-pair field of a position 
on a terminal screen. See the discussion of inch( ) on the curses(3X) manual 
page. 
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Following are descriptions of attrset( ) and the other curses routines that 
you can use to manipulate attributes. 
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attronO, attrsetO, and attrof£() 
SYNOPSIS 

#include <curses.h> 

int attron( attrs ) 
chtype attrs; 

int attrset( attrs ) 
chtype attrs; 

int attro£f( attrs ) 
chtype attrs; 

NOTES 

■ attron( ) turns on the requested attribute attrs in addition to any that 
are currently on. attrs is of the type chtype and is defined in 
<curses.h>. 

■ attrset( ) turns on the requested attributes attrs instead of any that are 
currently turned on. 

■ attroff ( ) turns off the requested attributes attrs if they are on. 

■ The attributes may be combined using the bitwise OR ( | ). 

■ All return OK. 

EXAMPLE 

See the highlight program under "curses Example Programs" in this 
chapter. 
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standout( ) and standend( ) 

SYNOPSIS 

#include <curses.h> 

int standout( ) 
int standend( ) 

NOTES 

■ standout( ) turns on the preferred highlighting attribute, 
A_STANDOUT, for the current terminal. This routine is equivalent 
to attron(A_STANDOUT). 

■ standend( ) turns off all attributes. This routine is equivalent to 
attrset(O), 

■ Both always return OK. 
EXAMPLE 

See the highlight program under "curses Example Programs" in this 
chapter. 
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Color Manipulation 

The curses color manipulation routines allow you to use colors on an 
alphanumeric terminal as you would use any other video attribute. You can 
find out if the curses library on your system supports the color routines by 
checking the file /usr/include/curses.h to see if it defines the macro 
COLOR_PAIR(«). 

This section begins with a description of the color feature at a general 
level. Then, the use of color as an attribute is explained. Next, the ways to 
define color-pairs and change the definitions of colors is explained. Finally, 
there are guidelines for ensuring the portability of your program, and a sec- 
tion describing the color manipulation routines and macros, with examples. 

How the Color Feature Worlce 

Colors are always used in pairs, consisting of a foreground color (used 
for the character) and a background color (used for the field the character is 
displayed on), curses uses this concept of color-pairs to manipulate colors. 
In order to use color in a curses program, you must first define (initialize) 
the individual colors, then create color-pairs using those colors, and finally, 
use the color-pairs as attributes. 

Actually, the process is even simpler, since curses maintains a table of 
initialized colors for you. This table has as many entries as the number of 
colors your terminal can display at one time. Each entry in the table has 
three fields: one each for the intensity of the red, green, and blue com- 
ponents in that color. 



NOTE 



curses uses RGB (Red, Green, Blue) color notation. This notation allows 
you to specify directly the intensity of red, green, and blue light to be 
generated in an additive system. Some terminals use an alternative nota- 
I tion, known as HSL (Hue, Saturation, Luminosity) color notation. Termi- 
nals that use HSL can be identified in the terminfo database, and curses 
will make conversions to RGB notation automatically. 

At the beginning of any curses program that uses color, all entries in the 
colors table are initialized with eight basic colors, as follows: 
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Intensity of Component 





(R)ed 


(G)reen 


(B)lue 


* black: */ 











* blue: 1 */ 








1000 


* green: 2*1 





1000 





* cyan: 3 */ 





1000 


1000 


* red: 4 */ 


1000 








* magenta: 5*1 


1000 





1000 


* yellow: 6 */ 


1000 


1000 





* white: 7 */ 


1000 


1000 


1000 



The Default Colors Table 



Most color alphanumeric terminals can display eight colors at the same 
time, but if your terminal can display more than eight, then the table will 
have more than eight entries. The same eight colors will be used to initial- 
ize additional entries. If your terminal can display only N colors, where N 
is less than eight, then only the first N colors shown in the Colors Table 
will be used. 

You can change these color definitions with the routine init_color( ), if 
your terminal is capable of redefining colors. (See the section "Changing 
the Definitions of Colors" for more information. 

The following color macros are defined in curses.h and have numeric 
values corresponding to their position in the Colors Table, 



OOLQR.BLACK 

OOIiOR_BLOE 1 

OOLCR_GREEN 2 

OOLOR.CyAN 3 

OOLOR^RED 4 

OOLOR.MAGENTA 5 

OOD0R_YELLCIW 6 

CX)LaR_WHITE 7 
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curses also maintains a table of color-pairs, which has space allocated for 
as many entries as the number of color-pairs that can be displayed on your 
terminal screen at the same time. Unlike the colors table, however, there 
are no default entries in the pairs table: it is your responsibility to initialize 
any color-pair you want to use, with init_pair( ), before you use it as an 
attribute. 

Each entry in the pairs table has two fields: the foreground color, and 
the background color. For each color-pair that you initialize, these two 
fields will each contain a number representing a color in the colors table. 
(Note that color-pairs can only be made from previously initialized colors.) 

The following example pairs table shows that a programmer has used 
init_pair( ) to initialize color-pair 1 as a blue foreground (entry 1 in the 
default color table) on yellow background (entry 6 in the default color 
table). Similarly, the programmer has initialized color-pair 2 as a cyan fore- 
ground on a magenta background. Not-initialized entries in the pairs table 
would actually contain zeros, which corresponds to black on black. 

Note that color-pair is reserved for use by curses and should not be 
changed or used in application programs. 



Color-Pair Number Foreground Background 



(reserved) 








1 


1 


6 


2 


3 


5 


3 








4 








5 















Example of a Pairs Table 



Two global variables used by the color routines are defined in 
<curses.h>. They are COLORS, which contains the maximum number of 
colors the terminal supports, and COLOR_PAIRS, which contains the max- 
imum number of color-pairs the terminal supports. Both are initialized by 
the start_color( ) routine to values it gets from the terminf o database. 
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Upon termination of your curses program, all colors and /or color-pairs 
will be restored to the values they had when the terminal was just turned 
on. 

Using the COLOR_PAIR(n) Attribute 

If you choose to use the default color definitions, there are only two 
things you need to do before you can use the attribute COLOR_PAIR(w). 
First, you must call the routine start_color( ). Once you've done that, you 
can initialize color-pairs with the routine init j)air(p«/>, /, b). The first argu- 
ment pair, is the number of the color-pair to be initialized (or changed), and 
must be between 1 and COLOR_PAIRS-l. The arguments / and b are the 
foreground color number and the background color number. The value of 
these arguments must be between and COLORS-1. For example, the two 
color-pairs in the pairs table described earlier can be initialized in the fol- 
lowing way: 

init_pair (1, OOLCa_BLUE, CDLCR.YEUXW) ; 
imt_pair (2, CDLOR.CXAN, 001jOR_MAGEOTA) ; 

Once you've initialized a color-pair, the attribute COLOR_PAIR(m) can 
be used as you would use any other attribute. COLOR_PAIR(n) is a macro, 
defined in <curses.h>. The argument, n, is the number of a previously ini- 
tialized color-pair. For example, you can use the routine attron( ) to turn on 
a color-pair in addition to any other attributes you may currently have 
turned on: 

attran (CD]jOR_PAIR( 1) ) ; 

If you had initialized color-pair 1 in the way shown in the example pairs 
table, then characters displayed after you turned on color-pair 1 with 
attron( ) would be displayed as blue characters on a yellow background. 

You can also combine COLOR__PAIR(«) with other attributes, for exam- 
ple: 

attrset(A_BLINK I 001jOR_PAIR( 1) ) ; 

would turn on blinking and whatever you have initialized color-pair 1 to 
be. (attron( ) and attrset( ) are described in the "Controlling Input and Out- 
put" section of this chapter, and also on the curses(3X) manual page in the 
Programmer's Reference ManuaL) 
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Changing the Definitions of Colors 

If your terminal is capable of redefining colors, you can change the 
predefined colors with the routine init_color(co/or, r, g, b). The first argu- 
ment, color, is the numeric value of the color you want to change, and the 
last three, r, g, and b, are the intensities of the red, green, and blue com- 
ponents, respectively, that the new color will contain. Once you change the 
definition of a color, all occurrences of that color on your screen change 
immediately. 

So, for example, you could change the definition of color 1 
(COLOR_BLUE by default), to be light blue, in the following way. 

init_color (OOLOR.BLOT:, 0, 700, 1000); 

If your terminal is not able to change the definition of a color, use of 
init_color( ) returns ERR. 

Portability Guidelines 

Like the rest of curses, the color manipulation routines have been 
designed to be terminal independent. But it must be remembered that the 
capabilities of terminals vary. For example, if you write a program for a ter- 
minal that can support sixty-four color-pairs, that program would not be 
able to produce the same color effects on a terminal that supports at most 
eight color-pairs. 

When you are writing a program that may be used on different termi- 
nals, you should follow these guidelines: 

Use at most seven color-pairs made from at most eight colors. 

Programs that follow this guideline will run on most color terminals. 
Only seven, not eight, color-pairs should be used, even though many 
terminals support eight color-pairs, because curses reserves color-pair 
for its own use. 

Do not use color as a background color. 

This is recommended because on some terminals, no matter what color 
you have defined it to be, color will always be converted to black 
when used for a background. 

Combine color and other video attributes. 

Programs that follow this guideline will provide some sort of highlight- 
ing, even if the terminal is monochrome. On color terminals, as many 
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of the listed attributes as possible would be used. On monochrome ter- 
minals, only the video attributes would be used, and the color attribute 
would be ignored. 

Use the global variables COLORS and COLOR-PAIRS rather than con- 
stants when deciding how many colors or color-pairs your program 
should use. 

Other Macros and Routines 

There are two other macros defined in <curses*h> that you can use to 
obtain information from the color-pair field in characters of type chtype. 

■ A_COLOR is a bit mask to extract color-pair information. It can be 
used to clear the color-pair field, and to determine if any color-pair is 
being used. 

■ PAIR_NUMBER(fl^frs) is the reverse of COLOR_PAIR(n). It returns 
the color-pair number associated with the named attribute, attrs. 

There are two color routines that give you information about the termi- 
nal your program is running on. The routine has_colors( ) returns a 
Boolean value: TRUE if the terminal supports colors, FALSE otherwise. 
The routine can_change_colors( ) also returns a Boolean value: TRUE if the 
terminal supports colors and can change their definitions, FALSE otherwise. 

There are two color routines that give you information about the colors 
and color-pairs that are currently defined on your terminal. The routine 
color_content( ) gives you a way to find the intensity of the RGB com- 
ponents in an initialized color. It returns ERR if the color does not exist or 
if the terminal cannot change color definitions, OK otherwise. The routine 
pair_content( ) allows you to find out what colors a given color-pair consists 
of. It returns ERR is the color-pair has not been initialized, OK otherwise. 

These routines are explained in more detail on the cursesOX) manual 
page in the Programmer's Reference Manual. 

The routines start_color( ), init_color( ), and init_pair( ) are described on 
the following pages, with examples of their use. You can also refer to the 
program colors in the section "curses Program Examples," at the end of this 
chapter, for an example of using the attribute of color in windows. 
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startjcolor( ) 
SYNOPSIS 

#include <curses»h> 
int start colort ) 

NOTES 

■ This routine must be called if you want to use colors, and before any 
other color manipulation routine is called. It is good practice to call 
it right after initscr( ). 

■ It initializes eight default colors (black, blue, green, cyan, red, 
magenta, yellow, and white), and the global variables COLORS and 
COLOR_PAIRS. If the value corresponding to COLOR_PAIRS in 
the terminfo database is greater than 64, COLOR_PAIRS will be set 
to 64. 

■ It restores the terminal's colors to the values they had when the ter- 
minal was just turned on. 

■ It returns ERR if the terminal does not support colors, OK otherwise. 
EXAMPLE 

See the example under init_pair( ). 
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init_pair( ) 
SYNOPSIS 

#include <curses.h> 

int init_pair (pair, f, b) 
short pair, f, b; 

NOTES 

■ iiiit_pair( ) changes the definition of a color-pair. 

■ Color-pairs must be initialized with init_pair( ) before they can be 
used as the argument to the attribute macro COLOR_PAIR(n). 

■ The value of the first argument, pair, is the number of a color-pair, 
and must be between 1 and COLOR_PAIRS-l. 

■ The value of the / (foreground) and b (background) arguments must 
be between and COLORS-1. 

■ If the color-pair was previously initialized, the screen will be 
refreshed and all occurrences of that color-pair will change to the 
new definition. 

■ It returns OK if it was able to change the definition of the color-pair, 
ERR otherwise. 

EXAMPLE 

#include <curses.h> 

iaaiii( ) 
{ 

initscr { ); 

if (start_color ( ) == QK) 
{ 

initjpair (1, CDIiOR_RED, OQLOR.Ga^EEN) ; 
attron (CDDCR.PAIR (1)); 
addstr ( "Red on Green" ) ; 
getch( ); 

} 

endwin( ); 

} 
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Also see the program colors in the section "curses Program Examples." 
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init_color 
SYNOPSIS 

#include <curses,h> 

int init_color(color, r, g, b) 
short color^ g, b; 

NOTES 

■ init_color( ) changes the definition of a color. 

■ The first argument, color, is the number of the color to be changed. 
The value of color must be between and COLORS-1. 

■ The last three arguments, r, g, and b, are the amounts of red, green, 
and blue (RBG) components in the new color. The values of these 
three arguments must be between and 1000. 

■ When initjcolor( ) is used to change the definition of an entry in the 
colors table, all places where the old color was used on the screen 
immediately change to the new color. 

■ It returns OK if it was able to change the definition of the color, ERR 
otherwise. 

EXAMPLE 

#inclvide <curses.h> 

inain( ) 
{ 

initscr( ); 

if (start oolccC ) == OK) 
{ 

init jair ( 1 , COU3R RED, COLOR GEREEN) ; 
attron (C3QLCR PAIR (1)); 

if (init color (OCXjCR RED, 0, 0, 1000) == OK) 

addstr ("BLUE ON GKEEN"); 
else 

addstr ("RED ON C3REEN"); 
getch { ); 

} 

endwiji( ); 

} 
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Bells, Whistles, and Flashing Lights 

Occasionally, you may want to get a user's attention. Two curses rou- 
tines were designed to help you do this. They let you ring the terminal's 
chimes and flash its screen. 

£lash( ) flashes the screen if possible, and otherwise rings the bell. 
Flashing the screen is intended as a bell replacement, and is particularly 
useful if the bell bothers someone within ear shot of the user. The routine 
beep( ) can be called when a real beep is desired. (If for some reason the 
terminal is unable to beep, but able to flash, a call to beep( ) will flash the 
screen.) 

beep( ) and flash( ) 
SYNOPSIS 

#include <curses.h> 

int flash( ) 
int beep( ) 

NOTES 

■ flash( ) tries to flash the terminals screen, if possible, and, if not, tries 
to ring the terminal bell. 

■ beep( ) tries to ring the terminal bell, if possible, and, if not, tries to 
flash the terminal screen. 

■ Neither returns any useful value. 
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Input Options 

The UNIX system does a considerable amount of processing on input 
before an application ever sees a character. For example, it does the follow- 
ing: 

■ echoes (prints back) characters to a terminal as they are typed 

■ interprets an erase character (typically #) and a line kill character 
(typically @) 

■ interprets a CTRL-D (control d) as end of file (EOF) 

■ interprets interrupt and quit characters 

■ strips the character's parity bit 

■ translates <CR> to <NL> 

Because a curses program maintains total control over the screen, curses 
turns off echoing on the UNIX system and does echoing itself. At times, 
you may not want the UNIX system to process other characters in the stan- 
dard way in an interactive screen management program. Some curses rou- 
tines, noecho( ) and cbreak( ), for example, have been designed so that you 
can change the standard character processing. Using these routines in an 
application controls how input is interpreted. Figure 10-7 shows some of 
the major routines for controlling input. 

Every curses program accepting input should set some input options. 
This is because when the program starts running, the terminal on which it 
runs may be in cbreak( ), raw( ), nocbreak( ), or noraw( ) mode. Although 
the curses program starts up in echo( ) mode, as Figure 10-7 shows, none of 
the other modes are guaranteed. 

The combination of noecho( ) and cbreak( ) is most common in interac- 
tive screen management programs. Suppose, for instance, that you don't 
want the characters sent to your application program to be echoed wherever 
the cursor currently happens to be; instead, you want them echoed at the 
bottom of the screen. The curses routine noecho( ) is designed for this pur- 
pose. However, when noecho( ) turns off echoing, normal erase and kill 
processing is still on. Using the routine cbreak( ) causes these characters to 
be uninterpreted. 
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Input 
Options 


Chars 
Interpreted 


icters 

Uninterpreted 


Normal 
'out of curses 
state' 


interrupt, quit 
stripping 
<CR> to <NL> 
echoing 
erase, kill 
EOF 




Normal 

curses 'start up 
state' 


echoing 
(simulated) 


All else 
undefined. 


cbreak( ) 
and echo( ) 


interrupt, quit 

stripping 

echoing 


erase, kill 
EOF 


cbreak( ) 
and noecho( ) 


interrupt, quit 
stripping 


echoing 
erase, kill 
EOF 


nocbreak( ) 
and noecho( ) 


break, quit 
stripping 
erase, kill 
EOF 


echoing 


nocbreak( ) 
and echo( ) 


See caution below. 


nl() 


<CR> to <NL> 




nonU ) 




<CR> to <NL> 


raw() 
(instead of 
cbreak( )) 




break, quit 
stripping 



Figure 10-7: Input Option Settings for curses Programs 
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VDo not use the combination nocbreak( ) and noecho( ). If you use it 
in a program and also use getch( ), the program will go in and out of 
cbreak( ) mode to get each character. Depending on the state of the 
tty driver when each character is typed, the program may produce 
undesirable output. 



In addition to the routines noted in Figure 10-7, you can use the curses 
routines noraw( ), half delay( ), and nodelay( ) to control input. See the 
curses(3X) manual page for discussions of these routines. 

The next few pages describe noecho( ), cbreak( ) and the related routines 
echo( ) and nocbreak( ) in more detail. 
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echo( ) and noecho( ) 

SYNOPSIS 

#include <curses.h> 

int echo( ) 
int noecho( ) 

NOTES 

■ echo() turns on echoing of characters by curses as they are read in. 
This is the initial setting. 

■ noecho( ) turns olEf the echoing. 

■ Neither returns any useful value. 

■ curses programs may not run properly if you turn on echoing with 
nocbreak( ). See Figure 10-7 and accompanying caution. After you 
turn echoing off, you can still echo characters with addch( ). 

EXAMPLE 

See the editor and show programs under "curses Program Examples" in 
this chapter. 
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cbreakO and nocbreak() 

SYNOPSIS 

#include <curses.h> 
int cbreak( ) 
int nocbreak( ) 

NOTES 

■ cbreak( ) turns on 'break for each character' processing. A program 
gets each character as soon as it is typed, but the erase, line kill, and 
CTRL-D characters are not interpreted. 

■ nocbreak( ) returns to normal 'line at a time' processing. This is typi- 
cally the initial setting. 

■ Neither returns any useful value. 

■ A curses program may not run properly if cbreak( ) is turned on and 
off within the same program or if the combination nocbreak( ) and 
echo( ) is used. 

■ See Figure 10-7 and accompanying caution. 
EXAMPLE 

See the editor and show programs under "curses Program Examples" in 
this chapter. 
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Building Windows and Pads 

An earlier section in this chapter, "More about refresh( ) and Windows" 
explained what windows and pads are and why you might want to use 
them. This section describes the curses routines you use to manipulate and 
create windows and pads. 

Output and Input 

The routines that you use to send output to and get input from win- 
dows and pads are similar to those you use with stdscr. The only difference 
is that you have to give the name of the window to receive the action. 
Generally, these functions have names formed by putting the letter w at the 
beginning of the name of a stdscr routine and adding the window name as 
the first parameter. For example, addch('c') would become waddch(mywin, 
'c') if you wanted to write the character c to the window mywin. Here's a 
list of the window (or w) versions of the output routines discussed in "Get- 
ting Simple Output and Input." 

■ waddch(ii;m, ch) 

■ mvwaddch(t<;m, y, x, ch) 

■ waddstr(t<;m, sir) 

■ mvwaddstr(t£7m, y, x, str) 

■ wprintwCwm, fmt [, arg...]) 

■ mvwprintw(wjm, y, x, fmt [, arg...]) 

■ vfrnoveiwin, y, x) 

■ y^clestriwin) and vferskseiwin) 

■ wclrtoeoKwm) and wclrtoboKwm) 

■ wref resh( ) 

You can see from their declarations that these routines differ from the 
versions that manipulate stdscr only in their names and the addition of a 
win argument. Notice that the routines whose names begin with mvw take 
the win argument before the y, x coordinates, which is contrary to what the 
names imply. See curses(3X) for more information about these routines or 
the versions of the input routines getch, getstr( ), and so on that you should 
use with windows. 
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All w routines can be used with pads except for wre£resh( ) and 
wnoutref resh( ) (see below). In place of these two routines, you have to use 
prefresh( ) and pnoutref resh( ) with pads. 

The Routines wnoutref resh( ) and doupdateO 

If you recall from the earlier discussion about ref resh( ), we said that it 
sends the output from stdscr to the terminal screen. We also said that it 
was a macro that expands to wrefresh(stdscr) (see '*What Every curses Pro- 
gram Needs" and "More about ref resh( ) and Windows"). 

The wref resh( ) routine is used to send the contents of a window (stdscr 
or one that you create) to a screen; it calls the routines wnoutref resh( ) and 
doupdate( ). Similarly, pref resh( ) sends the contents of a pad to a screen by 
calling pnoutref resh( ) and doupdate( ). 

Using wnoutrefresh( )— or pnoutrefresh( ) (this discussion will be lim- 
ited to the former routine for simplicity)— and doupdate( ), you can update 
terminal screens with more efficiency than using wref resh( ) by itself, 
wref resh( ) works by first calling wnoutref resh( ), which copies the named 
window to a data structure referred to as the virtual screen. The virtual 
screen contains what a program intends to display at a terminal. After cal- 
ling wnoutref resh( ), wref resh( ) then calls doupdate( ), which compares the 
virtual screen to the physical screen and does the actual update. If you 
want to output several windows at once, calling wref resh( ) will result in 
alternating calls to wnoutref resh( ) and doupdate( ), causing several bursts 
of output to a screen. However, by calling wnoutrefresh( ) for each window 
and then doupdate( ) only once, you can minimize the total number of char- 
acters transmitted and the processor time used. The following sample pro- 
gram uses only one doupdate( ): 
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nain{ ) 
{ 

WINDOW *w1, *w2; 



imtscar( ); 

w1 = iiewwin(2,6,0,3) ; 
w2 = newwind ,4,5,4); 
waddstr(w1, "Bulls"); 
wioatrefresh{w1 ) ; 
waddstr{w2, "Eye"); 
wncjutref resh( w2 ) ; 
doupdate( ); 
eixlwin( ); 



} 




Notice from the sample that you declare a new window at the begin- 
ning of a curses program. The lines 

w1 = newwin(2,6,0,3); 
w2 = newwjji( 1,4,5,4) ; 

declare two windows named w1 and w2 with the routine newwin( ) accord- 
ing to certain specifications. newwin() is discussed in more detail below. 

Figure 10-8 illustrates the effect of wnoutref resh( ) and doupdate( ) on 
these two windows, the virtual screen, and the physical screen: 
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initscrO 



wl«newwin 



w2=newwin 



stdscr @ {0,0) 



stdscr @ (0,0) 
□ 



wl @ (0,3) 
□ 



stdscr® (0,0) 
□ 



wl @ (0,3) 



virtual screen 
□ 



virtual screen 



□ 



virtual screen 



□ 



w2@ (5,4) 



physical screen 



(garbage) 



physical screen 



(garbage) 



physical screen 



(garbage) 



Figure 10-8: The Relationship Between a Window and a Terminal Screen 
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waddstr (wl,Bulls) 



stdscr @ (0,0) virtual screen physical screen 




(garbage) 



wl @ (0,3) w2 @ (5,4) 



Bulls □ 




□ 





wnoutrefresh (wl) 



waddstr (w2,Eye) 



stdscr @ (0,0) virtual screen physical screen 




(garbage) 



wl @ (0,3) w2 @ (5,4) 



Bulls □ 



stdscr @ (0,0) virtual screen physical screen 




Bulls □ 



(garbage) 



wl @ (0,3) w2@ (5,4) 



Bulls □ 



EyeQ 



Figure 10-8: The Relationship Between a Window and a Terminal Screen 
(continued) 



10-64 PROGRAMMER'S GUIDE 



Working with curses Routines 



wnoutrefresh(w2) 



doupdateO 



endwinO 



stdscr @ (0,0) virtual screen physical screen 




(garbage) 



wl @ (0,3) w2 @ (5,4) 



Bulls □ 




Eyeo 





stdscr @ (0,0) virtual screen physical screen 
D 




wl @ (0,3) w2 @ (5,4) 



Bulls □ 



Eyeo 



stdscr @ (0,0) virtual screen physical screen 
□ 




wl @ (0,3) w2 @ (5,4) 



Bulls □ 




Eye □ 







Figure 10-8: The Relationship Between a Window and a Terminal Screen 
(continued) 
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New Windows 

Following are descriptions of the routines newwin( ) and subwin( ), 
which you use to create new windows. For information about creating new 
pads with newpad( ) and subpad( ), see the curses(3X) manual page. 

newwin( ) 
SYNOPSIS 

#include <curses«h> 

WINDOW *newwin(nlines, ncols^ begin_y, beginjc) 
int nlines^ ncols, beginjr, begin_x; 

NOTES 

■ newwin( ) returns a pointer to a new window with a new data area. 

■ The variables nlines and ncols give the size of the new window. 

■ begin_y and begiii_x give the screen coordinates from (0,0) of the 
upper left corner of the window as it is refreshed to the current 
screen. 

EXAMPLE 

Recall the sample program using two windows; see Figure 10-8. Also 
see the window program under "curses Program Examples" in this chapter. 
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subwin( ) 
SYNOPSIS 

#include <curses.h> 

WINDOW «subwin(orig, nlines^ ncols, begin_y, beginjc) 
WINDOW ♦orig; 

int nlines, ncols, begin_y, begin x; 
NOTES 

■ subwin( ) returns a new window that points to a section of another 
window, orig. 

■ nlines and ncols give the size of the new window. 

■ begin_y and begin x give the screen coordinates of the upper left 
corner of the window as it is refreshed to the current screen. 

■ Subwindows and original windows can accidentally overwrite one 



another. 




Subwindows of subwindows do not work (as of the copyright date of 
this Programmer's Guide), 
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EXAMPLE 

#incli3de <curses.h> 

mciinC ) 
{ 

WINDOW ♦sub; 
imtscx( ); 

box(stidscr/w' /w' ) ; /* See the curses(3X) manual page for bax( ) */ 

nwwaddstr(stdscr,7,10," this is 10,10"); 

nnwaddch(stdscr,8, 10/ 1 ' ) ; 

iiwwaddch( stdscr , 9 , 1 / v' ) ; 

sub = subwin(stdscx,10,20,10,10); 

box(sub/s'/s') ; 

wnoutref resh( stdscr ) ; 

wrefresh(sub) ; 

enctwin( ); 

} 

This program prints a border of ws around the stdscr (the sides of your termi- 
nal screen) and a border of s's around the subwindow sub when it is run. For 
another example, see the window program under "curses Program Examples" in this 
chapter. 
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Using Advanced curses Features 

Knowing how to use the basic curses routines to get output and input 
and to work with windows, you can design screen management programs 
that meet the needs of many users. The curses library, however, has rou- 
tines that let you do more in a program than handle I/O and multiple win- 
dows. The following few pages briefly describe some of these routines and 
what they can help you do— namely, draw simple graphics, use a terminal's 
soft labels, and work with more than one terminal in a single curses pro- 
gram. 

You should be comfortable using the routines previously discussed in 
this chapter and the other routines for I/O and window manipulation dis- 
cussed on the curses(3X) manual page before you try to use the advanced 
curses features. 

7 The routines described under "Routines for Drawing Lines and Other 
Graphics" and "Routines for Using Soft Labels" are features that are 
new for UNIX System V Release 3.0, If a program uses any of these 
routines, it may not run on earlier releases of the UNIX system. You 
must use the Release 3.0 version of the curses library on UNIX Sys- 
tem V Release 3.0 to work with these routines. 



Routines for Drawing Lines and Other Graphics 

Many terminals have an alternate character set for drawing simple 
graphics (or glyphs or graphic symbols). You can use this character set in 
curses programs, curses use the same names for glyphs as the VTIOO line 
drawing character set. 

To use the alternate character set in a curses program, you pass a set of 
variables whose names begin with ACS_ to the curses routine waddch( ) or 
a related routine. For example, ACSJJLCORNER is the variable for the 
upper left corner glyph. If a terminal has a line drawing character for this 
glyph, ACS_ULCORNER's value is the terminal's character for that glyph 
OR'd ( I ) with the bit-mask A_ALTCHARSET. If no line drawing character 
is available for that glyph, a standard ASCII character that approximates the 
glyph is stored in its place. For example, the default character for 
ACS_HLINE, a horizontal line, is a — (minus sign). When a close approxi- 
mation is not available, a + (plus sign) is used. All the standard ACS_ 
names and their defaults are listed on the curses(3X) manual page. 
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Part of an example program that uses line drawing characters follows. 
The example uses the curses routine box( ) to draw a box around a menu on 
a screen. box( ) uses the line drawing characters by default or when | (the 
pipe) and - are chosen. (See curse8(3X).) Up and down more indicators are 
drawn on the box border (using ACS_U ARROW and ACS_D ARROW) if 
the menu contained within the box continues above or below the screen: 



bOK(menuwin, ACS_VLINB, ACS^HLINE); 



/* output the lap/down arrows */ 
vm3ve(inenuwin» tsBxy^ maxx - 5); 

output up arrcw or horizontal line */ 
if (noreabove) 

vfactdch(iiienuwin, ACS.UARRCW); 
else 

addchdnenuwin, ACS_HLINE); 

/*output dcwn arcw or horizantal line */ 
if (irorebelow) 

waddchdnenuMin, A3S_DABRCW); 
else 

waddch(iDenuvdn» iUCS_HLINE}; 



Here's another example. Because a default down arrow (like the lower- 
case letter v) isn't very discernible on a screen with many lowercase charac- 
ters on it, you can change it to an uppercase V. 

if ( ! (ACS^DARRDW a A_ALTCHARSE?r) ) 
ACS.DARRCW = ; 

For more information, see curses(3X) in the Programmer's Reference 
Manual, 
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Routines for Using Soft Labels 

Another feature available on most terminals is a set of soft labels across 
the bottom of their screens. A terminal's soft labels are usually matched 
with a set of hard function keys on the keyboard. There are usually eight 
of these labels, each of which is usually eight characters wide and one or 
two lines high. 

The curses library has routines that provide a uniform model of eight 
soft labels on the screen. If a terminal does not have soft labels, the bottom 
line of its screen is converted into a soft label area. It is not necessary for 
the keyboard to have hard function keys to match the soft labels for a 
curses program to make use of them. 

Let's briefly discuss most of the curses routines needed to use soft 
labels: slk_init( ), slk_set( ), slk ref resh( ) and slk noutref resh( ), slkdear( ), 
slk_restore( ), slk_attron( ), slk_attrset( ), and sIk_attroff( ). 

When you use soft labels in a curses program, you have to call the rou- 
tine slk_int( ) before initscr( ). This sets an internal flag for initscr( ) to look 
at that says to use the soft labels. If initscr( ) discovers that there are fewer 
than eight soft labels on the screen, that they are smaller than eight charac- 
ters in size, or that there is no way to program them, then it will remove a 
line from the bottom of stdscr to use for the soft labels. The size of stdscr 
and the LINES variable will be reduced by 1 to reflect this change. A prop- 
erly written program, one that is written to use the LINES and COLS vari- 
ables, will continue to run as if the line had never existed on the screen. 

slk_init( ) takes a single argument. It determines how the labels are 
grouped on the screen should a line get removed from stdscr. The choices 
are between a 3-2-3 arrangement as appears on AT&T terminals, or a 4-4 
arrangement as appears on Hewlett-Packard terminals. The curses routines 
adjust the width and placement of the labels to maintain the pattern. The 
widest label generated is eight characters. 

The routine slk_set() takes three arguments, the label number (1-8), the 
string to go on the label (up to eight characters), and the justification within 
the label (0 = left justified, 1 = centered, and 2 = right justified). 

The routine slk_noutref resh( ) is comparable to wnoutref resh( ) in that 
it copies the label information onto the internal screen image, but it does 
not cause the screen to be updated. Since a wrefresh( ) commonly follows, 
sik noutref resh( ) is the function that is most commonly used to output the 
labels. 
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Just as wref resh( ) is equivalent to a wnoutref resh( ) followed by a 
doupdate( ), so too the function slk_ref resh( ) is equivalent to a 
$lk_noutref resh( ) followed by a doupdate( ). 

If initscr( ) uses the bottom line of stdscr to simulate soft labels, the rou- 
tines slk_attron( ), slk_attrset( ), and slk_attroff ( ) can be used to manipulate 
the appearance of the simulated soft labels. If you use these routines to 
manipulate the color of the simulated soft labels, keep in mind that soft 
labels are shown in reverse video by default. Note that these routines will 
have no effect on soft function key labels supplied by the terminal. These 
routines are similar to attron( ), attrset( ), and attroff ( ) (see the section "Con- 
trolling Output and Input" in this chapter). 

To prevent the soft labels from getting in the way of a shell escape, 
slk_clear( ) may be called before doing the endwin( ). This clears the soft 
labels off the screen and does a doupdate( ). The function slk_restore( ) may 
be used to restore them to the screen. See the curses(3X) manual page for 
more information about the routines for using soft labels. 

Working with More than One Terminal 

A curses program can produce output on more than one terminal at the 
same time. This is useful for single process programs that access a common 
database, such as multi-player games. 

Writing programs that output to multiple terminals is a difficult busi- 
ness, and the curses library does not solve all the problems you might 
encounter. For instance, the programs — not the library routines — must 
determine the file name of each terminal line, and what kind of terminal is 
on each of those lines. The standard method, checking $TERM in the 
environment, does not work, because each process can only examine its own 
environment. 

Another problem you might face is that of multiple programs reading 
from one line. This situation produces a race condition and should be 
avoided. However, a program trying to take over another terminal cannot 
just shut off whatever program is currently running on that line. (Usually, 
security reasons would also make this inappropriate. But, for some applica- 
tions, such as an inter-terminal communication program, or a program that 
takes over unused terminal lines, it would be appropriate.) A typical solu- 
tion to this problem requires each user logged in on a line to run a program 
that notifies a master program that the user is interested in joining the mas- 
ter program and tells it the notification program's process ID, the name of 
the tty line, and the type of terminal being used. Then the program goes to 
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sleep until the master program finishes. When done, the master program 
wakes up the notification program and all programs exit. 

A curses program handles multiple terminals by always having a 
current terminal. All function calls always affect the current terminal. The 
master program should set up each terminal, saving a reference to the ter- 
minals in its own variables. When it wishes to affect a terminal, it should 
set the current terminal as desired, and then call ordinary curses routines. 

References to terminals in a curses program have the type SCREEN*. A 
new terminal is initialized by calling newterm( type, outfd, infd). newterm 
returns a screen reference to the terminal being set up. type is a character 
string, naming the kind of terminal being used, outfd is a stdio(3S) file 
pointer (FILE*) used for output to the terminal and infd a file pointer for 
input from the terminal. This call replaces the normal call to initscr( ), 
which calls newterm(getenv("TERM'0, stdout, stdin). 

To change the current terminal, call set_term(sp) where sp is the screen 
reference to be made current. set_term( ) returns a reference to the previous 
terminal. 

It is important to realize that each terminal has its own set of windows 
and options. Each terminal must be initialized separately with newterm( ). 
Options such as cbreak( ) and noecho( ) must be set separately for each ter- 
minal. The functions endwin( ) and ref resh( ) must be called separately for 
each terminal. Figure 10-9 shows a typical scenario to output a message to 
several terminals. 



curses/terminfo 10-73 



Working with curses Routines 




for (i=0; i<ntenn; i++) 
{ 

set_tenn( terms [ i ] ) ; 

mvack3str(0, 0, "Iiportant message"); 

refreshC ); 



} 




Figure 10-9: Sending a Message to Several Terminals 



See the two program under "curses Program Examples" in this chapter 
for a more complete example. 
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Some programs need to use lower level routines (i.e., primitives) than 
those offered by the curses routines. For such programs, the terminfo rou- 
tines are offered. They do not manage your terminal screen, but rather give 
you access to strings and capabilities which you can use yourself to manipu- 
late the terminal. 

There are three circumstances when it is proper to use terminfo rou- 
tines. The first is when you need only some screen management capabili- 
ties, for example, making text standout on a screen. The second is when 
writing a filter. A typical filter does one transformation on an input stream 
without clearing the screen or addressing the cursor. If this transformation 
is terminal dependent and clearing the screen is inappropriate, use of the 
terminfo routines is worthwhile. The third is when you are writing a spe- 
cial purpose tool that sends a special purpose string to the terminal, such as 
programming a function key, setting tab stops, sending output to a printer 
port, or dealing with the status line. Otherwise, you are discouraged from 
using these routines: the higher level curses routines make your program 
more portable to other UNIX systems and to a wider class of terminals. 



NOTE 



You are discouraged from using terminfo routines except for the pur- 
poses noted, because curses routines take care of all the glitches 
present in physical terminals. When you use the terminfo routines, 
you must deal with the glitches yourself. Also, these routines may 
change and be incompatible with previous releases. 



What Every terminfo Program Needs 

A terminfo program typically includes the header files and routines 
shown in Figure 10-10. 
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#include <curses.h> 
#iiiclude <term.h> 




setuptem( (char*)0, 1, (int«)0 ); 



putp(clear_sc^een} ; 



reset_shell_mode( ); 
exit(O); 





Figure 10-10: Typical Framework of a terminfo Program 



The header files <curses,h> and <term.h> are required because they 
contain the definitions of the strings, numbers, and flags used by the ter- 
minfo routines. setupterm( ) takes care of initialization. Passing this rou- 
tine the values (char«)0, 1, and (inl*)0 invokes reasonable defaults. If setup- 
term( ) can't figure out what kind of terminal you are on, it prints an error 
message and exits. reset_shell_mode( ) performs functions similar to 
endwin( ) and should be called before a terminfo program exits. 

A global variable like clear_screen is defined by the call to setupterm( ). 
It can be output using the terminfo routines putp( ) or tputs( ), which gives 
a user more control. This string should not be directly output to the termi- 
nal using the C library routine printf(3S), because it contains padding infor- 
mation. A program that directly outputs strings will fail on terminals that 
require padding or that use the xon/xoff flow control protocol. 

At the terminfo level, the higher level routines like addch( ) and 
getch( ) are not available. It is up to you to output whatever is needed. For 
a list of capabilities and a description of what they do, see terminf o(4); see 
curses(3X) for a list of all the terminfo routines. 
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Compiling and Running a terminfo Program 

The general command line for compiling and the guidelines for run- 
ning a program with terminfo routines are the same as those for compiling 
any other curses program. See the sections "Compiling a curses Program" 
and "Running a curses Program" in this chapter for more information. 



An Example terminfo Program 

The example program termhl shows a simple use of terminfo routines. 
It is a version of the highlight program (see "curses Program Examples") 
that does not use the higher level curses routines, termhl can be used as a 
filter. It includes the strings to enter bold and underline mode and to turn 
off all attributes. 




* A terminfo level version of the highlight program. 
*/ 



#include <curses.h> 
#include <term.h> 

int ulnDde = 0; /♦ Currently underlining */ 

main(argc, argv) 
int argc; 
char ♦*argv; 

{ 

FILE *fd; 
int c, c2; 
int outch( ); 

if (argc > 2) 
{ 

fprintfCstderr, "Usage: teorihl [file]\n"); 
exit(l); 

} 



if (argc == 2) 
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continued 



fd = fopen(argv[1], "r"); 

if {fd NULL) 

{ 

perrar(argv[1]); 
exit(2); 

} 

} 

else 
{ 

fd = stdin; 

} 

setuptenn{(cihar»)0, 1, (int»)0); 



far (;;) 
{ 

c = getc(fd) ; 
if (c == EOF) 
break; 

if (c == 'V) 
{ 

c2 = getc(fd); 
switch (c2} 
{ 

case 'B': 

tpats{enter_bold_itDde, 1, OQtch); 

contiiHie; 

case 'U': 

tputs(enter_underline_inode, 1» outch); 
ulHode = 1; 
ccnttnue; 
case 'N': 

tpats(exit_attribate_inodfi, 1, outch); 

ulnDde = 0; 

continue; 

} 

putch(c); 
patch(c2) ; 

} 

else 

pLitch(c); 
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( 

} 

fclose(fd); 
fflushCstdout); 
resetterm( ); 
exit(O); 

} 

/♦ 

* Uiis function is like putchar, but it checks for untierlining. 
♦/ 

ixitdi(c) 
int c; 

{ 

Gutch(c) ; 

if (ulroode && \inderlinfi_char) 
{ 

outchCW); 

tputs(underline_dhar, 1, outch) ; 

} 

} 

/* 

* Oitchar is a function version of putchar that can be passed to 

* tpits as a routine to call. 
♦/ 

outch(c) 
int c; 

{ 



putchar(c}; 

} 




Let's discuss the use of the function ii^vXsicaip, affcnt, outc) in this pro- 
gram to gain some insight into the terminfo routines. tputs( ) appUes pad- 
ding information. Some terminals have the capability to delay output. 
Their terminal descriptions in the terminfo database probably contain 
strings like $<20>, which means to pad for 20 milliseconds (see the follow- 
ing section "Specify Capabilities" in this chapter), tputs generates enough 
pad characters to delay for the appropriate time. 
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tput( ) has three parameters. The first parameter is the string capability 
to be output. The second is the number of lines affected by the capability. 
(Some capabilities may require padding that depends on the number of 
lines affected. For example, insertjine may have to copy all lines below 
the current line, and may require time proportional to the number of lines 
copied. By convention affcnt is 1 if no lines are affected. The value 1 is 
used, rather than 0, for safety, since affcnt is multiplied by the amount of 
time per item, and anything multiplied by is 0.) The third parameter is a 
routine to be called with each character. 

For many simple programs, affcnt is always 1 and outc always calls 
putchar. For these programs, the routine putp(cap) is a convenient abbrevi- 
ation, termhl could be simplified by using putp( ). 

Now to understand why you should use the curses level routines 
instead of terminfo level routines whenever possible, note the special check 
for the underline_char capability in this sample program. Some terminals, 
rather than having a code to start underlining and a code to stop underlin- 
ing, have a code to underline the current character, termhl keeps track of 
the current mode, and if the current character is supposed to be underlined, 
outputs underline_char, if necessary. Low level details such as this are pre- 
cisely why the curses level is recommended over the terminfo level, curses 
takes care of terminals with different methods of underlining and other ter- 
minal functions. Programs at the terminfo level must handle such details 
themselves. 

termhl was written to illustrate a typical use of the terminfo routines. 
It is more complex than it need be in order to illustrate some properties of 
terminfo programs. The routine vidattr (see curses(3X)) could have been 
used instead of directly outputting enter_bold__mode, 
enter_underline_mode, and exit_attribute_mode. In fact, the program 
would be more robust if it did, since there are several ways to change video 
attribute modes. 
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The terminfo database describes the many terminals with which curses 
programs, as well as some UNIX system tools, like vi(l), can be used. Each 
terminal description is a compiled file containing the names that the termi- 
nal is known by and a group of comma-separated fields describing the 
actions and capabilities of the terminal. This section describes the terminfo 
database, related support tools, and their relationship to the curses library. 

Writing Terminal Descriptions 

Descriptions of many popular terminals are already described in the ter- 
minfo database. However, it is possible that you'll want to run a curses 
program on a terminal for which there is not currently a description. In 
that case, you'll have to build the description. 

The general procedure for building a terminal description is as follows: 

1 . Give the known names of the terminal. 

2. Learn about, list, and define the known capabilities. 

3. Compile the newly-created description entry. 

4. Test the entry for correct operation. 

5. Go back to step 2, add more capabilities, and repeat, as necessary. 

Building a terminal description is sometimes easier when you build 
small parts of the description and test them as you go along. These tests 
can expose deficiencies in the ability to describe the terminal. Also, modify- 
ing an existing description of a similar terminal can make the building task 
easier. (Lest we forget the UNIX motto: Build on the work of others.) 

In the next few pages, we follow each step required to build a terminal 
description for the fictitious terminal named "myterm." 

Name the Terminal 

The name of a terminal is the first information given in a terminfo ter- 
minal description. This string of names, assuming there is more than one 
name, is separated by pipe symbols ( | ). The first name given should be the 
most common abbreviation for the terminal. The last name given should be 
a long name that fully identifies the terminal. The long name is usually the 
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manufacturer's formal name for the terminal. All names between the first 
and last entries should be known synonyms for the terminal name. All 
names but the formal name should be typed in lowercase letters and contain 
no blanks. Naturally, the formal name is entered as closely as possible to 
the manufacturer's name. 

Here is the name string from the description of the AT&T Teletype 5420 
Buffered Display Terminal: 

5420!att5420!ATS.T Teletype 5420, 

Notice that the first name is the most commonly used abbreviation and the 
last is the long name. Also notice the comma at the end of the name string. 

Here's the name string for our fictitious terminal, myterm: 

nQ^emlitt/tiniininelfancyltenni^ FANCY Terminal, 

Terminal names should follow common naming conventions. These 
conventions start with a root name, like 5425 or myterm, for example. The 
root name should not contain odd characters, like hyphens, that may not be 
recognized as a synonym for the terminal name. Possible hardware modes 
or user preferences should be shown by adding a hyphen and a 'mode indi- 
cator' at the end of the name. For example, the 'wide mode' (which is 
shown by a -w) version of our fictitious terminal would be described as 
myterm-w. term(5) describes mode indicators in greater detail. 

Learn About the Capabilities 

After you complete the string of terminal names for your description, 
you have to learn about the terminal's capabilities so that you can properly 
describe them. To learn about the capabilities your terminal has, you 
should do the following: 

■ See the owner's manual for your terminal. It should have informa- 
tion about the capabilities available and the character strings that 
make up the sequence transmitted from the keyboard for each capa- 
bility. 

■ Test the keys on your terminal to see what they transmit, if this 
information is not available in the manual. You can test the keys in 
one of the following ways — type: 

stty —echo; cat — vu 

Type in the keys you want to test; 

for example, see what right arrow transmits. 
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<CR> 
<CTRL-D> 
stty echo 

or 

cat >dev/null 

Type in the escape sequences you want to test; 
for example, see what \E[H transmits. 
<CTRL-D> 

■ The first line in each of these testing methods sets up the terminal to 
carry out the tests. The <CTRL-D> helps return the terminal to its 
normal settings. 

■ See the terminfo(4) manual page. It lists all the capability names you 
have to use in a terminal description. 

The following section, "Specify Capabilities/' gives details. 

Specify Capabilities 

Once you know the capabilities of your terminal, you have to describe 
them in your terminal description. You describe them with a string of 
comma-separated fields that contain the abbreviated terminfo name and, in 
some cases, the terminal's value for each capability. For example, bel is the 
abbreviated name for the beeping or ringing capability. On most terminals, 
a CTRL-G is the instruction that produces a beeping sound. Therefore, the 
beeping capability would be shown in the terminal description as bel=*G,. 

The list of capabilities may continue onto multiple lines as long as 
white space (that is, tabs and spaces) begins every line but the first of the 
description. Comments can be included in the description by putting a # at 
the beginning of the line. 

The terminfo(4) manual page has a complete list of the capabilities you 
can use in a terminal description. This list contains the name of the capa- 
bility, the abbreviated name used in the database, the two-letter code that 
corresponds to the old termcap database name, and a short description of 
the capability. The abbreviated name that you will use in your database 
descriptions is shown in the column titled "Capname." 
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For a curses program to run on any given terminal, its description in the 
NOTE terminfo database must include, at least, the capabilities to move a cursor 
I in all four directions and to clear the screen. 



A terminal's character sequence (value) for a capability can be a keyed 
operation (like CTRL-G), a numeric value, or a parameter string containing 
the sequence of operations required to achieve the particular capability. In 
a terminal description, certain characters are used after the capability name 
to show what type of character sequence is required. Explanations of these 
characters follow: 

# This shows a numeric value is to follow. This character follows a 
capability that needs a number as a value. For example, the number 
of columns is defined as cols#80/. 

= This shows that the capability value is the character string that fol- 
lows. This string instructs the terminal how to act and may actually 
be a sequence of commands. There are certain characters used in 
the instruction strings that have special meanings. These special 
characters follow: 

This shows a control character is to be used. For example, 
the beeping sound is produced by a CTRL-G. This would 
be shown as "G. 

\E or \e These characters followed by another character show an 

escape instruction. An entry of \EC would transmit to the 
terminal as ESCAPE-C. 

These characters provide a <NL> character sequence. 
These characters provide a linefeed character sequence. 
These characters provide a return character sequence. 
These characters provide a tab character sequence. 
These characters provide a backspace character sequence. 
These characters provide a formfeed character sequence. 



\n 
\l 
\r 
\t 
\b 
\f 
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These characters provide a space character sequence. 

This is a character whose three-digit octal is nnn, where 
nnn can be one to three digits. 

These symbols are used to show a delay in milliseconds. 
The desired length of delay is enclosed inside the "less 
than/greater than" symbols (< >). The amount of delay 
may be a whole number, a numeric value to one decimal 
place (tenths), or either form followed by an asterisk (*). 
The * shows that the delay will be proportional to the 
number of lines affected by the operation. For example, a 
20-millisecond delay per line would appear as $<20*>. 
See the terminfo(4) manual page for more information 
about delays and padding. 

Sometimes, it may be necessary to comment out a capability so that the 
terminal ignores this particular field. This is done by placing a period ( . ) 
in front of the abbreviated name for the capability. For example, if you 
would like to comment out the beeping capability, the description entry 
would appear as 

.bel=^G, 

With this background information about specifying capabilities, let's add 
the capability string to our description of myterm. We'll consider basic, 
screen-oriented, keyboard-entered, and parameter string capabilities. 

Basic Capabilities 

Some capabilities common to most terminals are bells, columns, lines on 
the screen, and overstriking of characters, if necessary. Suppose our ficti- 
tious terminal has these and a few other capabilities, as listed below. Note 
that the list gives the abbreviated terminfo name for each capability in the 
parentheses following the capability description: 

■ An automatic wrap around to the beginning of the next line when- 
ever the cursor reaches the right-hand margin (am). 

■ The ability to produce a beeping sound. The instruction required to 
produce the beeping sound is (bel). 

■ An 80-column wide screen (cols). 



\s 

\nnn 
$< > 
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■ A 30-line long screen (lines). 

■ Use of xon/xoff protocol (xon). 

By combining the name string (see the section "Name the Terminal") 
and the capability descriptions that we now have, we get the following gen- 
eral terminfo database entry: 

nytemlnQrtmlrcdiie! fancy I termini FANCY temdnal, 
am, bels'^G, cols#80, lines#30, xon, 

Screen-Oriented Capabilities 

Screen-oriented capabilities manipulate the contents of a screen. Our 
example terminal myterm has the following screen-oriented capabilities. 
Again, the abbreviated command associated with the given capability is 
shown in parentheses. 

■ A <CR> is a CTRL-M (cr). 

■ A cursor up one line motion is a CTRL-K (cuul). 

■ A cursor down one line motion is a CTRL-J (cudl). 

■ Moving the cursor to the left one space is a CTRL-H (cubl). 

■ Moving the cursor to the right one space is a CTRL-L (cufl). 

■ Entering reverse video mode is an ESCAPE-D (smso). 

■ Exiting reverse video mode is an ESCAPE-Z (rmso). 

■ A clear to the end of a line sequence is an ESCAPE-K and should 
have a 3-millisecond delay (el). 

■ A terminal scrolls when receiving a <NL> at the bottom of a page 
(ind). 

The revised terminal description for myterm including these screen- 
oriented capabilities follows: 
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nytemlnytmlininelfancylterTr^^ EANOf Tesnrdnal, 
am, bel=^G, cx>ls#80, lines#30, aeon, 
cr="M, cviu1="K, ciid1="J, cub1="H, cuf1="L, 
snnso=\ED, nnsc5=\EZ, el=\E3C$<3>, ind=\n, 

V 

Keyboard-Entered Capabilities 

Keyboard-entered capabilities are sequences generated when a key is 
typed on a terminal keyboard. Most terminals have, at least, a few special 
keys on their keyboard, such as arrow keys and the backspace key. Our 
example terminal has several of these keys whose sequences are, as follows: 

■ The backspace key generates a CTRL-H (kbs). 

■ The up arrow key generates an ESCAPE-[ A (kcuul). 

■ The down arrow key generates an ESCAPE-[ B (kcudl). 

■ The right arrow key generates an ESCAPE-[ C (kcufl). 

■ The left arrow key generates an ESCAPE-[ D (kcubl). 

■ The home key generates an ESCAPE-[ H (khome). 

Adding this new information to our database entry for myterm pro- 
duces: 




IIytem!^ytmlIIdIle^fancylterIninall^^ EANC3f Terminal, 
am, bel="G, ools#80, lines#30, xnn, 
crs'M, cuu1="K, cud1="J, cub1="H, cuf1="L, 
smsc>=\HD, xroso=\EZ, el=\EK$<3>, iiid=0 
Wds="H, kcini1=\E[A, kcud1=\E[B, kcuf 1=\E[C, 
kcub1=\E[D, ldicine=\E[H, 
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Parameter String Capabilities 

Parameter string capabilities are capabilities that can take parameters — 
for example, those used to position a cursor on a screen or turn on a combi- 
nation of video modes. To address a cursor, the cup capability is used and 
is passed two parameters: the row and column to address. String capabili- 
ties, such as cup and set attributes (sgr) capabilities, are passed arguments in 
a terminfo program by the tparm( ) routine. 

The arguments to string capabilities are manipulated with special 
sequences similar to those found in a printf (3S) statement. In addition, 
many of the features found on a simple stack-based RPN calculator are 
available, cup, as noted above, takes two arguments; the row and column, 
sgr, takes nine arguments, one for each of the nine video attributes. See 
terminfo(4) for the list and order of the attributes and further examples of 
sgr. 

Our fancy terminal's cursor position sequence requires a row and 
column to be output as numbers separated by a semicolon, preceded by 
ESCAPE-[ and followed with H. The coordinate numbers are 1-based rather 
than 0-based. Thus, to move to row 5, column 18, from (0,0), the sequence 
'ESCAPE-[ 6 ; 19 H' would be output. 

Integer arguments are pushed onto the stack with a '%p' sequence fol- 
lowed by the argument number, such as '%p2' to push the second argument. 
A shorthand sequence to increment the first two arguments is '%i'. To out- 
put the top number on the stack as a decimal, a '%d' sequence is used, 
exactly as in printf. Our terminal's cup sequence is built up as follows: 



cup= 


Meaning 


\E[ 


output ESCAPE-[ 


%i 


increment the two arguments 


%pl 


push the 1st argument (the row) onto the stack 


%d 


output the row as a decimal 


r 


output a semi-colon 


%p2 


push the 2nd argument (the column) onto the stack 


%d 


output the column as a decimal 


H 


output the trailing letter 



or 

cup=\E[%i%p1%d;9^2%aH, 
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duces: 



Adding this new information to our database entry for myterm pro- 





iiytemlnytmiiniiifiifancylterindmlitty EANCy atenniiial, 
am, bel=''G, ools#80, lines#30, xnn, 
cr=X cviu1="K, cudls^J, cub1="H, cuf 1=''L, 
sinso=\ED, rraso=\EZ, el=\EK$<3>, indssO 
kbss^H, kciia1=\E[A, kcud1=\E[B, kcuf 1=\E[C, 
kcub1=NE[D, l£hcxne=\EtH, 
ci:qp=\E[%i3^196d;5^p236ftH, 



See terminfo(4) for more information about parameter string capabilities. 

Compile the Description 

The terminfo database entries are compiled using the tic compiler. This 
compiler translates terminfo database entries from the source format into 
the compiled format. 

The source file for the description is usually in a file suffixed with .ti. 
For example, the description of myterm would be in a source file named 
myterm.ti. The compiled description of myterm would usually be placed in 
/usr/lib/terminfo/m/myterm, since the first letter in the description entry 
is m. Links would also be made to synonyms of myterm, for example, to 
/f /fancy. If the environment variable $TERMINFO were set to a directory 
and exported before the entry was compiled, the compiled entry would be 
placed in the $TERMINFO directory. All programs using the entry would 
then look in the new directory for the description file if $TERMINFO were 
set, before looking in the default /usr/lib/terminfo. The general format for 
the tic compiler is as follows: 

tic [-V] [-C] file 

The —V option causes the compiler to trace its actions and output infor- 
mation about its progress. The -c option causes a check for errors; it may 
be combined with the -v option, file shows what file is to be compiled. If 
you want to compile more than one file at the same time, you have to first 
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use cat(l) to join them together. The following command line shows how 
to compile the terminfo source file for our fictitious terminal: 

tic -V myterm.ti<CR> 

(The trace information appears as the compilation 
proceeds.) 

Refer to the tic(lM) manual page in the System Administrator's Reference 
Manual for more information about the compiler. 

Test the Description 

Let's consider three ways to test a terminal description. First, you can 
test it by setting the environment variable $TERMINFO to the path name 
of the directory containing the description. If programs run the same on 
the new terminal as they did on the older known terminals, then the new 
description is functional. 

Second, you can test for correct insert line padding by commenting out 
xon in the description and then editing (using vi(l)) a large file (over 100 
lines) at 9600 baud (if possible), and deleting about 15 lines from the middle 
of the screen. Type u (undo) several times quickly. If the terminal messes 
up, then more padding is usually required. A similar test can be used for 
inserting a character. 

Third, you can use the tput(l) command. This command outputs a 
string or an integer according to the type of capability being described. If 
the capability is a Boolean expression, then tput sets the exit code (0 for 
TRUE, 1 for FALSE) and produces no output. The general format for the 
tput command is as follows: 

tput [—Ttype] capname 

The type of terminal you are requesting information about is identified with 
the —Ttype option. Usually, this option is not necessary because the default 
terminal name is taken from the environment variable $TERM. The cap- 
name field is used to show what capability to output from the terminfo data- 
base. 

The following command line shows how to output the "clear screen" 
character sequence for the terminal being used: 

tput clear 

(The screen is cleared.) 
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The following command line shows how to output the number of 
columns for the terminal being used: 

tput cols 

(The number of columns used by the terminal appears here.) 

The tput(l) manual page found in the User's Reference Manual contains 
more information on the usage and possible messages associated with this 
command. 



Comparing or Printing terminfo Descriptions 

Sometime you may want to compare two terminal descriptions or 
quickly look at a description without going to the terminfo source direc- 
tory. The infocmp(lM) command was designed to help you with both of 
these tasks. Compare two descriptions of the same terminal; for example, 

mkdir /tmp/old /tmp/new 
TERMINFO=/tmp/old tic old5420.ti 
TERMINFO=/tmp/new tic new5420.ti 
infocmp -A /tmp/old -B /tmp/new -d 5420 5420 

compares the old and new 5420 entries. 

To print out the terminfo source for the 5420, type 

infocmp —I 5420 



Converting a termcap Description to a terminfo 
Description 



The terminfo database is designed to take the place of the termcap 
database. Because of the many programs and processes that have 
been written with and for the termcap database, it is not feasible to 
do a complete cutover at one time. Any conversion from termcap to 
terminfo requires some experience with both databases. All entries 
into the databases should be handled with extreme caution. These 
files are important to the operation of your terminal. 
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The captoinfo(lM) command converts termcap(4) descriptions to ter- 
minfo(4) descriptions. When a file is passed to captoinfo, it looks for 
termcap descriptions and writes the equivalent terminfo descriptions on the 
standard output. For example, 

captoinfo /etc /termcap 

converts the file /etc/ termcap to terminfo source, preserving comments and 
other extraneous information within the file. The command line 

captoinfo 

looks up the current terminal in the termcap database, as specified by the 
$TERM and $TERMCAP environment variables and converts it to 
terminfo. 

If you must have both termcap and terminfo terminal descriptions, 
keep the terminfo description only and use inf ocmp — C to get the termcap 
descriptions. This is recommended because the terminfo entry will be more 
complete, descriptive, and accurate than the termcap entry possibly could 
be. 

If you have been using cursor optimization programs with the 
-Itermcap or — Itermlib option in the cc command line, those programs will 
still be functional. However, these options should be replaced with the 
— Icurses option. 
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The following examples demonstrate uses of curses routines. 

The editor Program 

This program illustrates how to use curses routines to write a screen 
editor. For simplicity, editor keeps the buffer in stdscr; obviously, a real 
screen editor would have a separate data structure for the buffer. This pro- 
gram has many other simplifications: no provision is made for files of any 
length other than the size of the screen, for lines longer than the width of 
the screen, or for control characters in the file. 

Several points about this program are worth making. First, it uses the 
move( ), mvaddstr( ), flash( ), wnoutref resh( ) and clrtoeol( ) routines. These 
routines are all discussed in this chapter under "Working with curses Rou- 
tines." 

Second, it also uses some curses routines that we have not discussed. 
For example, the function to write out a file uses the mvinch( ) routine, 
which returns a character in a window at a given position. The data struc- 
ture used to write out a file does not keep track of the number of characters 
in a line or the number of lines in the file, so trailing blanks are eliminated 
when the file is written. The program also uses the insch( ), delch( ), 
insertln( ), and deleteln( ) routines. These functions insert and delete a 
character or line. See curses(3X) for more information about these routines. 

Third, the editor command interpreter accepts special keys, as well as 
ASCII characters. On one hand, new users find an editor that handles spe- 
cial keys easier to learn about. For example, it's easier for new users to use 
the arrow keys to move a cursor than it is to memorize that the letter h 
means left, j means down, k means up, and 1 means right. On the other 
hand, experienced users usually like having the ASCII characters to avoid 
moving their hands from the home row position to use special keys. 



NOTE 



Because not all terminals have arrow keys, your curses programs will 
work on more terminals if there is an ASCII character associated with 
each special key. 
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Fourth, the CTRL-L command illustrates a feature most programs using 
curses routines should have. Often some program beyond the control of 
the routines writes something to the screen (for instance, a broadcast mes- 
sage) or some line noise affects the screen so much that the routines cannot 
keep track of it. A user invoking editor can type CTRL-L, causing the 
screen to be cleared and redrawn with a call to wrefresh(curscr). 

Finally, another important point is that the input command is ter- 
minated by CTRL-D, not the escape key. It is very tempting to use escape 
as a command, since escape is one of the few special keys available on every 
keyboard. (Return and break are the only others.) However, using escape 
as a separate key introduces an ambiguity. Most terminals use sequences of 
characters beginning with escape (i.e., escape sequences) to control the ter- 
minal and have special keys that send escape sequences to the computer. If 
a computer receives an escape from a terminal, it cannot tell whether the 
user depressed the escape key or whether a special key was pressed. 

editor and other curses programs handle the ambiguity by setting a 
timer. If another character is received during this time, and if that character 
might be the beginning of a special key, the program reads more input 
until either a full special key is read, the time out is reached, or a character 
is received that could not have been generated by a special key. While this 
strategy works most of the time, it is not foolproof. It is possible for the 
user to press escape, then to type another key quickly, which causes the 
curses program to think a special key has been pressed. Also, a pause 
occurs until the escape can be passed to the user program, resulting in a 
slower response to the escape key. 

Many existing programs use escape as a fundamental command, which 
cannot be changed without infuriating a large class of users. These pro- 
grams cannot make use of special keys without dealing with this ambiguity, 
and at best must resort to a time-out solution. The moral is clear: when 
designing your curses programs, avoid the escape key. 



/* editor: A screen-oriented editor. T3ie laser 

♦ interface is similar to a subset of vi. 

♦ IHne buffer is kept in stdscr to sinplify 

♦ the program. 
♦/ 

#include <curses.h> 
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^define C7ERL{c) ({c) & 037) 

miii(argc, argv) 
int argc; 
char **argv; 
{ 

extern void perrar( ) , exit( ) ; 
int i, n, 1; 
int c; 

int line = 0; 
FIIiE *£d; 

if (argc 1= 2) 
{ 

fpQdntf(stderr, "Osage: %s file\n% argv[0]); 
exLtd); 

} 

fd = fopen(argv[1], "r"); 

if (fd == NtILL) 

{ 

perror(argv[1]); 
exit (2) ; 

} 

initscr( ); 
chreak( ); 
nanl( ); 
noecho( }; 

idlok(stdscr, TRUE); 
keypad (stdscr, IIUJE}; 

/* Read in the file ♦/ 
v^le ((c = getc(fd)) 1= BDF) 
{ 

if (c == '\n') 

line++; 
if (line > UNES - 2) 

break; 

addch(c) ; 
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irove(0,0); 
refresh( ); 
edit( ); 

/* Write cut the file ♦/ 

fd = fqpen(argv[1], V); 

for (1 = 0; 1 < LINES - 1; 1++) 

{ 

n = lend); 

for (i = 0; i < n; i++) 

putc{niviiicli(l, i) & A.CHARTEXT, fd) ; 
patcCW, fd); 

} 

fclose(fd); 

endwin( ); 
exit{0); 



} 



lendlneno) 
int lineno; 
{ 



int linelen = GQLS - 1; 

viinile (linelen >= mvinch( lineno, linelen) == ' ') 

linelen — ; 
return linelen + 1; 



/* Global value of current cursor position ♦/ 
int rcw» col; 

edit( ) 
{ 

int c; 



for {;;) 
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inove(row, cx>l); 
refreshC ); 
c = getch{ ) ; 

/* Editor ocmrands */ 

switch (c) 

{ 

/* hjkl and arrow keys: move cursor 

* in direction indicated */ 
case 'h': 
Ccise KEy^LEPT: 

if (col > 0) 

col — ; 

else 

£lash( ); 

break; 

case 'j': 
case KEY_DOWN: 

if (row < LINES - 1) 
row++; 

else 

flash( ); 

break; 

case 'k': 
case KEy_UP: 

if (row > 0) 

row — ; 

else 

f lash( ) ; 

break; 

case '1': 
case KEy_RlGHT: 

if (col < OOLS - 1) 
00I++; 

else 

flash( }; 
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continued 




/* i: enter ii^t mode */ 
case KEY_IC: 
case 'i': 

ii^Mt( ); 
break; 

/* x: delete current character */ 
case KEY_DC: 
case 'x': 

delch( ) ; 
break; 

/* o: open up a new lijie and enter ii^t mode */ 
case KEY^IL: 
case 'o': 

niove(++row, ool « 0); 
insertln( ); 
ir^t( ) ; 
break; 

A d: delete current line */ 
case KEy_DL: 
case 'd': 

deleteln( ); 
break; 

/♦ '^L: redraw screen */ 
case KEY.CLBAR: 
case CTRL( 'L' ) : 

wrefresh(curscr) ; 

break; 

/* w: write and quit */ 
case V: 



return; 
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/♦ q: quit vdthout writing */ 
case 'q': 



e3idwin( ); 
exit(2); 



default: 



flash( ); 
break; 



} 



/* 

* Insert nocte: accept characters and insert them, 

♦ End vdth "D or EIC 
♦/ 

ii^t( ) 
{ 



int c; 
standout ( ); 

ravaddstr(IJNES - 1, CXDLS - 20, "INPUT MODE"); 

standend( ); 

iiove(row, ool); 

refresh( ); 

for (;;) 

{ 

c = getch( ); 

if (c == CTOLCD') I! c == KEy_Erc) 
break; 

insch(c) ; 
inove{row, ++col); 
refresht ); 



} 

inove{LINES - 1, OOLS - 20); 
clrtoeoK ); 
TXJve(rcMy ool); 
refresh{ ); 



} 





curses/terminfo 10-99 



Examples 



The highlight Program 



This program illustrates a use of the routine attrset( ). highlight reads a 
text file and uses embedded escape sequences to control attributes. \U turns 
on underlining, \B turns on bold, and \N restores the default output attri- 
butes. 

Note the first call to scrollok( ), a routine that we have not previously 
discussed (see curses(3X)). This routine allows the terminal to scroll if the 
file is longer than one screen. When an attempt is made to draw past the 
bottom of the screen, scrollok( ) automatically scrolls the terminal up a line 
and calls ref resh( ). 



* hic^ight: a pxogram to turn \U, \B, and 

* \N sequences intx) hi^ighted 

* outputf allowing vrords to be 

* displayed underlined or in bold. 
*/ 

#ijiclude <curses.h> 

nain(argc, argv) 
int argc; 
char ♦♦argv; 
{ 



FILE *fd; 
int c, c2; 

void exit( ), perror( ); 

if (argc != 2) 
{ 

fprintf(stderr, "Usage: highlight fileNn"); 





exit( 1 ) ; 



fd = fqpen(argv[1], V); 



if (fd == NOLL) 
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perrar{argv[1]); 
exit(2); 



} 



initscr{ ); 

scrollok(stdscr, TE^}; 
ncnl( ); 

vAiile {(c = getc(fd)) I« EOF) 
{ 

if (c == 'W') 
{ 

c2 = getc(fcl); 
switch (c2) 



case 'B': 



case 'U': 



attrset(A_BOID); 
oontinue; 

attrset(A_UNDERLINE) ; 
continue; 



case 'N': 



attrset(O); 
oontljnie; 



continued 



} 

else 



} 

fclose(fd); 
ref resh( ) ; 
endvdii( ); 
exit(O); 



addch(c); 
addch(c2); 



acldcdi(c) ; 
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The scatter Program 

This program takes the first LINES - 1 lines of characters from the stan- 
dard input and displays the characters on a terminal screen in a random 
order. For this program to work properly, the input file should not contain 
tabs or non-printing characters. 




* The scatter parogram. 



#include <curses.h> 
#include <sys/types.h> 

extern tiiae_t tiine( ) ; 

Mef±ne MAXI^INES 120 
#def ine MAXOOLS 160 
char s[}WXLIMES][MAXiOOLS]; 
Int T[MAXLINES][MAXiOOL5]; 



nBin( ) 
{ 

register int row = 0,ool = 0; 
register int c; 
int char_oount = 0; 
time.t t; 

void exit( ), srancl( }; 
initscr( }; 

for(rGW = O;row < MAXLIN£S;roW'f-i-) 

for(ool = 0;ool < MftXOOLS;ool++) 
s[row][ool]=' '; 

c»l = row = 0; 

/» Read screen in ♦/ 

vMle ((cs:getchar( )) 1= EOF 5& row < LINES ) { 



if(c 1= 'Nn') 




/* Scxeen Array */ 

/» Tag Array - Keeps track of ♦ 

* the number of characters * 

♦ printed and their positions. 
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/* Place char in screen array */ 
s[row][ool++] = c; 
if(c != ' ') 



tiine(&t) ;/* Seed the randan nmnber generator ♦/ 
srand{ (\ansigned)t) ; 

vdiile (char_coant) 
{ 



rw = rand( ) % LINES; 

col = (rand( ) » 2) % OOLS; 

if (T[row][ool] !« 1 S& s[row][Gol] != ' ') 



inove(row, col); 
addch( s [row] [ col 1 ) ; 
T[row][col] = 1; 
cihar_oount~ ; 
refresh( ); 



char_count++ ; 



} 

else 
{ 



ool = 0; 



rcw++; 



} 



} 



{ 



} 



} 

endwin( ); 
exit(O); 



} 





curses/terminfo 10-103 



Examples 



The show Program 

show pages through a file, showing one screen of its contents each time 
you depress the space bar. The program calls cbreak( ) so that you can 
depress the space bar without having to hit return; it calls noecho( ) to 
prevent the space from echoing on the screen. The nonl( ) routine, which 
we have not previously discussed, is called to enable more cursor optimiza- 
tion. The idlok( ) routine, which we also have not discussed, is called to 
allow insert and delete line. (See curses(3X) for more information about 
these routines). Also notice that clrtoeol( ) and clrtobot( ) are called. 

By creating an input file for show made up of screen-sized (about 24 
lines) pages, each varying slightly from the previous page, nearly any exer- 
cise for a curses( ) program can be created. This type of input file is called a 
show script. 




#ijiclude <signal.li> 



itain(argc, argv) 
int argc; 
char *argv[]; 
{ 

FILE *fd; 

char lioobuf [BUFSIZ] ; 
int line; 

void dane{ ) , permr( ) , exit( ) ; 

if (argc != 2) 
{ 

fprintf (stderr, "usage: %s fileNn", argvCO]); 
exitd); 

} 

if ((fd=fcpen(argv[1], "r")) == NULL) 
{ 



perror(argv[1]); 
exit(2); 

} 
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signal {SIGINT, acne); 

initscr( ); 
noecho{ ); 
cbreak( ); 
nonl( ); 

idlok(stdscr, TPUE); 
vidled) 



move(0,0); 

for {line = 0; line < LINES; line++) 



if ( Ifgetsdinebuf , sizeof linebuf, fd)) 



clrtobot( ); 
done( ); 

} 

mcjvedine, 0); 
printw("%s", linebaf); 

) 

refresh{ ); 
if (getchC ) == 'q') 
<iQne( ); 



motvedJMS - 1, 0); 
clrtoeoK ); 
refresh( ); 
endwlnC }; 
exit(O); 



{ 



void done( ) 
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The two Program 

This program pages through a file, writing one page to the terminal 
from which the program is invoked and the next page to the terminal 
named on the command line. It then waits for a space to be typed on either 
terminal and writes the next page to the terminal at which the space is 
typed. 

two is just a simple example of a two-terminal curses program. It does 
not handle notification; instead, it requires the name and type of the second 
terminal on the command line. As written, the command "sleep 100000" 
must be typed at the second terminal to put it to sleep while the program 
runs, and the user of the first terminal must have both read and write per- 
mission on the second terminal. 




#iiiclude <signal.h> 



SCREEN *me, *yDu; 
SCEIEEN «set_tenn( ); 

FILE *fd, *f<^rou; 
char linfibuf[512]; 

nainCargc, argv) 
±nt azgc; 
char **argv; 
{ 

void dane( ), exit( ); 
unsigned sleep ( ); 
char «getenv( ); 
ant c; 

if (argc 1= 4) 
{ 

fprintf(stderr, "Usage: two othertty otherttjrtype ii^tf ile\n" ) ; 
exitd); 

} 
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fd = fapen(argvC3], V); 
fcfyOQ = fopen(argv[1] , "w+"); 
signal (SIGINT, dbne); /« die gracefully »/ 

me = nevrt:erm(getenv( "TEKM" ) , stdout, stdin); /* initialize ny tty */ 
you = newt:ena(argv[2], fdyou, fdyaa);/* Initialize the other terminal */ 



set_tenn(me) ; /♦ Set modes for ray terminal «/ 
noecho( ); A turn off tty echo */ 

cbreak{ ) ; /« enter cbreak node ♦/ 

nonl( ); /» Allow linefeed */ 

nodelay{stascr, IKUE); /* No hang on ii^t */ 

set_teiin(you) ; /* Set modes for other terminal */ 
noecho( ); 
cbreak{ ); 
nonl( ); 

nodelay { stdscr , TFOJE ) ; 

/* Dump first screen full on my terminal ♦/ 
duiiipjpage(me) ; 

/* DuTp second screen full on the other terminal */ 
dijii?>_page(you) ; 

for ( ; ; ) /* for each screen full */ 
{ 

set_term{me) ; 
c = getch( ); 

if (c == 'q')A vait for user to read it ♦/ 

done( ); 

if (c == ' ') 

diii)Dp_page(me) ; 



set_term(you) ; 
c = getch( ); 

if (c == 'q')/* wait for user to read it */ 

dane( ); 

if (c == ' ') 

dunpjageCyou) ; 

sleep( 1 ) ; 



} 

} 
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duiip_page ( term) 
SCBEESa *term; 

{ 



int line; 

set_tenn(term) ; 
no(ve(0, 0); 

for (line = 0; line < LINES - 1; line++) { 

if (fgetsdinebuf , sizeof linebuf , fd) == NULL) { 
clrtabat( ); 
done( ); 
> 

mvaddstrdine, 0, linebuf); 

} 

staxy3out( ); 

mvprintwCLINES - 1, 0, "—More—"); 
staxidencK ); 

refresh( ); /♦ sync screen ♦/ 

} 

/* 

* Clean up and exit. 
♦/ 

void done( ) 
{ 



A Clean up first terminal */ 
set_term(you) ; 

nove(LINES - 1,0); A to lower left ooomer */ 



clrtoeoK ); 
refresh{ ); 
endwin( ); 



/# clear bottom line */ 
/* f liish out everything */ 
/* curses cleanu^j »/ 



/* Clean \ip second terminal */ 
set_term(me) ; 

nove(LINES - 1,0); A to lower left corner */ 
clrtoeoK ) ; /* clear bottom line */ 

ref resh{ ) ; /* flush cut everything ♦/ 

endwinC ); A curses cleanup */ 

exit(O); 



} 
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The window Program 

This example program demonstrates the use of multiple windows. The 
main display is kept in stdscr. When you want to put something other than 
what is in stdscr on the physical terminal screen temporarily, a new win- 
dow is created covering part of the screen. A call to wref resh( ) for that 
window causes it to be written over the stdscr image on the terminal 
screen. Calling ref resh( ) on stdscr results in the original window being 
redrawn on the screen. Note the calls to the touchwin( ) routine (which we 
have not discussed — see curses(3X)) that occur before writing out a win- 
dow over an existing window on the terminal screen. This routine prevents 
screen optimization in a curses program. If you have trouble refreshing a 
new window that overlaps an old window, it may be necessary to call 
touchwin( ) for the new window to get it completely written out. 



#incliide <curses.h> 
VffNDCW *C3ndwiti; 
min( ) 



{ 



int i, c; 
char buf[120]; 
void exit( ); 

initscr( ); 
iionl( ); 
iioecho( ); 
cbreak( ); 

cndwin = newwinO, OOLS, 0, 0);/» top 3 lines #/ 
for (i = 0; i < UNES; i++) 

ravprintw(i, 0, '"niis is line %3i of stdscr", i); 




Examples 




continued 




for {;;) 



{ 



refresh( ); 
c = getch( ); 
sfwitch (c) 



{ 



case 'c':/« Enter ocxmaiid fron keyboard ♦/ 



\«erase(acidwin); 

wprantw(crcdwin, "Enter connandl:"}; 

\ffDove(aactwin, 2, 0); 

for (i = 0; i < OOLS; i++) 



vano(ve(aidMin, 1, 0); 
touchi(dn(aidwin) ; 
wrefresh(cmt3vdii) ; 
\^tstr(aDctwin, tuf); 
touchwin{ stdscr ) ; 

/# 

* Die oonnand is now in buf. 

* It should be processed here. 
♦/ 



waddchCcmdwin, '-'); 



case 



endwlii( ); 
exit(O); 



} 



} 
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The colors Program 



This program creates two windows. All characters displayed in the first 
window will be in red, on a blue background. All characters displayed in 
the second window will be in yellow, on a magenta background. 




#mcltatae <curses.h> 




#de£iiie PAIR1 
j9^e£±ne PAIE2 
nain( ) 
{ 



1 

2 



WINDCW *win1, *win2; 



initscr( ); 

if ((start_color( )) == OK) 
{ 



/* create windcws ♦/ 

winl = newwin (5, 40, 0, 0); 

win2 - neviwin (5, 40, 15, 40); 

/* cxeate twD color pairs */ 

initjair (PAIR1, CDOLOR.RED, OOLCR_HLUE) ; 

init_pair (PAIR2, CDLOR.YELLCW, OOLOR.MAGENTA) ; 

/♦ turn on color attriiautes for each winticw */ 

wattron (vdnl, COLDR^PAm (PAERI)); 

wattron (vdn2, CX)IiaR_PAIR (PAIR2)); 

/* print seme text in each window and exit */ 
waddstr (win1 , "This should be red on blue" ) ; 
waddstr (win2, "This should be yellow on magenta"); 
vmoutrefresh (vrin1); 
vmoutrefresh (win2); 
doupdate( ); 

/* wait for any key before terminating ♦/ 
wgetch (vdn2); 



} 

endwin( ); 
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The Common Object File Format 
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The Common Object File Format (COFF) 

This section describes the Common Object File Format (COFF) used on 
AT&T computers with the UNIX operating system. COFF is the format of 
the output file produced by the assembler, as, and the link editor. Id. 

Some key features of COFF are 

■ applications can add system-dependent information to the object file 
without causing access utilities to become obsolete 

■ space is provided for symbolic information used by debuggers and 
other applications 

■ programmers can modify the way the object file is constructed by 
providing directives at compile time 

The object file supports user-defined sections and contains extensive 
information for symbolic software testing. An object file contains 



■ 


a file header 


■ 


optional header information 


■ 


a table of section headers 


■ 


data corresponding to the section headers 


■ 


relocation information 


■ 


line numbers 


■ 


a symbol table 


■ 


a string table 



Figure 11-1 shows the overall structure. 
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The Common Object File Format (COFF) 



FILE HEADER 
Optional Information 
Section 1 Header 



Section n Header 
Raw Data for Section 1 



Raw Data for Section n 
Relocation Info for Sect. 1 



Relocation Info for Sect, n 
Line Numbers for Sect, 1 



Line Numbers for Sect, n 
SYMBOL TABLE 
STRING TABLE 



Figure 11-1: Object File Format 



The last four sections (relocation, line numbers, symbol table, and the string 
table) may be missing if the program is linked with the -s option of the Id 
command, or if the line number information, symbol table, and string table 
are removed by the strip command. The line number information does not 
appear unless the program is compiled with the -g option of the cc com- 
mand. Also, if there are no unresolved external references after linking, the 
relocation information is no longer needed and is absent. The string table 
is also absent if the source file does not contain any symbols with names 
longer than eight characters. 

An object file that contains no errors or unresolved references is con- 
sidered executable. 
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Definitions and Conventions 

Before proceeding further, you should become familiar with the follow- 
ing terms and conventions. 

Sections 

A section is the smallest portion of an object file that is relocated and 
treated as one separate and distinct entity. In the most common case, there 
are three sections named .text, .data, and .bss. Additional sections accom- 
modate comments, multiple text or data segments, shared data segments, or 
user-specified sections. However, the UNIX operating system loads only 
.text, .data, and .bss into memory when the file is executed. 



NOTE 



It a mistake to assume that every COFF file will have a certain number 
of sections, or to assume characteristics of sections such as their order, 
their location in the object file, or the address at which they are to be 
I loaded. This information is available only after the object file has been 
created. Programs manipulating COFF files should obtain it from file 
and section headers in the file. 

Physical and Virtual Addresses 

The physical address of a section or symbol is the offset of that section 
or symbol from address zero of the address space. The term physical 
address as used in COFF does not correspond to general usage. The physi- 
cal address of an object is not necessarily the address at which the object is 
placed when the process is executed. For example, on a system with pag- 
ing, the address is located with respect to address zero of virtual memory 
and the system performs another address translation. The section header 
contains two address fields, a physical address, and a virtual address; but in 
all versions of COFF on UNIX systems, the physical address is equivalent to 
the virtual address. 



Target Machine 

Compilers and link editors produce executable object files that are 
intended to be run on a particular computer. In the case of cross-compilers, 
the compilation and link editing are done on one computer with the intent 
of creating an object file that can be executed on another computer. The 
term target machine refers to the computer on which the object file is des- 
tined to run. In the majority of cases, the target machine is the exact same 
computer on which the object file is being created. 
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File Header 



The file header contains the 20 bytes of information shown in Figure 
11-2. The last 2 bytes are flags that are used by Id and object file utilities. 



Bytes 


Declaration 


Name 


Description 


n 1 


unsigned short 


f__magic 


Magic number 


2-3 


unsigned short 


fnscns 


Number of sections 


4-7 


long int 


f_timdat 


Time and date stamp indicat- 
ing when the file was created, 
expressed as the number of 
elapsed seconds since 00:00:00 
GMT, January 1, 1970 


8-11 


long int 


fsymptr 


File pointer containing the 
starting address of the symbol 
table 


12-15 


long int 


fnsyms 


Number of entries in the sym- 
bol table 


16-17 


unsigned short 


f_opthdr 


Number of bytes in the 
optional header 


18-19 


unsigned short 


f_flags 


Flags (see Figure 11-3) 



Figure 11-2: File Header Contents 



Magic Numbers 

The magic number specifies the target machine on which the object file 
is executable. 

Flags 

The last 2 bytes of the file header are flags that describe the type of the 
object file. Currently defined flags are found in the header file filehdr.h, 
and are shown in Figure 11-3. 
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Mnemonic 


Flag 


Meaning 


F_RELFLG 


00001 


Relocation information 
stripped from the file 


F_EXEC 


00002 


File is executable (i.e., no 
unresolved external refer- 
ences) 


F_LNNO 


00004 


Line numbers stripped from 
the file 


F LSYMS 


00010 


Local symbols stripped from 
the file 


F_AR32W 


0001000 


32 bit word 


F_BM32B 


0020000 


32100 required 


F_BM32MAU 


0040000 


MAU required 



Figure 11-3: File Header Flags (3B2 Computer) 



File Header Declaration 

The C structure declaration for the file header is given in Figure 11-4. 
This declaration may be found in the header file filehdr.h. 
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struct f ilehdr 
{ 



unsigned short 
unsigned ^xxrt 


£ mgic; 
f^nscns; 


/* 
/* 


ttiagic number */ 
nmrber of section */ 


long 


f^tindat; 


/* 


time and date staicp */ 


looig 


£ syiuptx; 


/♦ 


file ptr to symbol table ♦/ 


long 


f^nsyins; 


/# 


nmrtoer entries in the symbol table ♦/ 


unsigned sl^rt 


f opthdr; 


/* 


size of optional header #/ 


unsigned short 


f_flags; 


/♦ 


flags ♦/ 



}; 

#clef ine FILHDR struct f ileMr 
#define FTUiSZ sizeof (FILHER) 



Figure 11-4: File Header Declaration 



Optional Header Information 

The template for optional information varies among different systems 
that use COFF. Applications place all system-dependent information into 
this record. This allows different operating systems access to information 
that only that operating system uses without forcing all COFF files to save 
space for that information. General utility programs (for example, the sym- 
bol table access library functions, the disassembler, etc.) are made to work 
properly on any common object file. This is done by seeking past this 
record using the size of optional header information in the file header field 
f_opthdr. 
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Standard UNIX System a.out Header 

By default, files produced by the link editor for a UNIX system always 
have a standard UNIX system a.out header in the optional header field. 
The UNIX system a.out header is 28 bytes. The fields of the optional 
header are described in Figure 11-5. 



Dytes 


Declaration 


iName 




0-1 


short 


magic 


Magic number 


2-3 


short 


vstamp 


Version stamp 


4-7 


long int 


tsize 


Size of text in bytes 


8-11 


long int 


dsize 


Size of initialized data in bytes 


12-15 


long int 


bsize 


Size of uninitialized data in 
bytes 


16-19 


long int 


entry 


Entry point 


20-23 


long int 


text_start 


Base address of text 


24-27 


long int 


data_start 


Base address of data 



Figure 11-5: Optional Header Contents (3B2, 3B5, 3B15 Computers) 



Whereas, the magic number in the file header specifies the machine on 
which the object file runs, the magic number in the optional header sup- 
plies information telling the operating system on that machine how that file 
should be executed. The magic numbers recognized by the 3B2/3B5/3B15 
UNIX operating system are given in Figure 11-6. 
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Value 


Meaning 


0407 


The text segment is not write-protected or 
sharable; the data segment is contiguous with 
the text segment. 


0410 


The data segment starts at the next segment 
following the text segment and the text seg- 
ment is write protected. 


0413 


Text and data segments are aligned within 
a.out so it can be directly paged. 



Figure 11-6: UNIX System Magic Numbers (3B2, 3B5, 3B15 Computers) 



Optional Header Declaration 

The C language structure declaration currently used for the UNIX sys- 
tem a.out file header is given in Figure 11-7. This declaration may be found 
in the header file aouthdr.h. 
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typedef struct aouthdr 
{ 



short 


mgic; 


A 


magic number ♦/ 


short 


vstanp; 


A 


version stanp ♦/ 


long 


tsize; 


A 


text size in bytes, padded */ 






A 


to full vrard boundary */ 


long 


dsize; 


A 


imtialized data size */ 


long 


bsize; 


A 


uninitialized data size */ 


long 


entry; 


A 


entry point ♦/ 


long 


text_start; 


A 


base of text for this file */ 


Icng 


data start 


A 


base of data for this file */ 



} ACUIHDR; 



Figure 11-7: aouthdr Declaration 



Section Headers 

Every object file has a table of section headers to specify the layout of 
data within the file. The section header table consists of one entry for 
every section in the file. The information in the section header is described 
in Figure 11-8. 
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Bytes 




Name 


Description 


0-7 


char 


sname 


8-character null padded sec- 
tion name 


8-11 


long int 


s^paddr 


Physical address of section 


12-15 


long int 


s_vaddr 


Virtual address of section 


16-19 


long int 


s_size 


Section size in bytes 


Oft 


long int 


s_scnptr 


riie pointer to raw uata 


24-27 


long int 


sjrelptr 


File pointer to relocation 
entries 


28-31 


long int 


s_lnnoptr 


File pointer to line number 
entries 


32-33 


unsigned 
short 


sjnreloc 


Number of relocation entries 


34-35 


unsigned 
short 


s_nlnno 


Number of line number 
entries 


36-39 


long int 


s_flags 


Flags (see Figure 11-9) 



Figure 11-8: Section Header Contents 



The size of a section is padded to a multiple of 4 bytes. File pointers are 
byte offsets that can be used to locate the start of data, relocation, or line 
number entries for the section. They can be readily used with the UNIX 
system function fseek(3S). 

Flags 

The lower 2 bytes of the flag field indicate a section type. The flags are 
described in Figure 11-9. 
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xvxxL«nxi.v^i I Ilk. 




IViframng 


STYP_REG 


0x00 


Regular section (allocated, relocated, 
loaded) 


STYPDSECT 


0x01 


Dummy section (not allocated, relo- 
cated, not loaded) 


STYP_NOLOAD 


0x02 


Noload section (allocated, relocated, 
not loaded) 


STYP_GROUP 


0x04 


Grouped section (formed from input 
sections) 


STYP_PAD 


0x08 


Padding section (not allocated, not 
relocated, loaded) 


STYPCOPY 


0x10 


Copy section (for a decision function 
used in updating fields; not allocated, 
not relocated, loaded, relocation and 
line number entries processed nor- 
mally) 


STYP_TEXT 


0x20 


Section contains executable text 


STYP_DATA 


0x40 


Section contains initialized data 


STYP_BSS 


0x80 


Section contains only uninitialized 
data 


STYPJNFO 


0x200 


Comment section (not allocated, not 
relocated, not loaded) 


STYP_OVER 


0x400 


Overlay section (relocated, not allo- 
cated, not loaded) 


STYP_LIB 


0x800 


For .lib section (treated like 
STYPJNFO) 



Figure 11-9: Section Header Flags 
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Section Header Declaration 

The C structure declaration for the section headers is described in Fig- 
ure 11-10. This declaration may be found in the header file scnhdr.h. 




char 


s_naine[8]; 


/♦ 


section name */ 


lc3ng 


s_paddr; 


A 


physical address */ 


long 


s_vaddr; 


/* 


virtual address */ 


long 


s_size; 


/* 


section size ♦/ 


Icng 


s scnptr; 


A 


file ptr to section raw data »/ 


long 


s_relptr; 


A 


file ptr to relocation */ 


long 


s_lnnqptr; 


A 


file ptr to line number */ 


unsigned 


short s_nreloc; 


A 


noniber of relocation entries */ 


unsigned 


short s_nlnnD; 


A 


nvcnber of line nurrber entries */ 


long 


sflags; 


A 


flags */ 


SCKHCR 


struct scaihdr 






SCMHSZ 


sizeof (SCNHER) 








Figure 11-10: Section Header Declaration 



*bss Section Header 

The one deviation from the normal rule in the section header table is 
the entry for uninitialized data in a .bss section. A .bss section has a size 
and symbols that refer to it, and symbols that are defined in it. At the same 
time, a .bss section has no relocation entries, no line number entries, and no 
data. Therefore, a .bss section has an entry in the section header table but 
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occupies no space elsewhere in the file. In this case, the number of reloca- 
tion and line number entries, as well as all file pointers in a .bss section 
header, are 0. The same is true of the STYP_NOLOAD and STYP__DSECT 
sections. 



Sections 

Figure 11-1 shows that section headers are followed by the appropriate 
number of bytes of text or data. The raw data for each section begins on a 
4-byte boundary in the file. 

Link editor SECTIONS directives (see Chapter 12) allow users to, among 
other things: 

■ describe how input sections are to be combined 

■ direct the placement of output sections 

■ rename output sections 

If no SECTIONS directives are given, each input section appears in an 
output section of the same name. For example, if a number of object files, 
each with a .text section, are linked together the output object file contains 
a single .text section made up of the combined input .text sections. 



Relocation information 

Object files have one relocation entry for each relocatable reference in 
the text or data. The relocation information consists of entries with the for- 
mat described in Figure 11-11. 



Bytes 


Declaration 


Name 


Description 


0-3 


long int 


r_vaddr 


(Virtual) address of reference 


4-7 


long int 


rjsymndx 


Symbol table index 


8-9 


unsigned short 


r_type 


Relocation type 



Figure 11-11: Relocation Section Contents 
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The first 4 bytes of the entry are the virtual address of the text or data 
to which this entry applies. The next field is the index, counted from 0, of 
the symbol table entry that is being referenced. The type field indicates the 
type of relocation to be applied. 

As the link editor reads each input section and performs relocation, the 
relocation entries are read. They direct how references found within the 
input section are treated. The currently recognized relocation types are 
given in Figure 11-12. 



Mnemonic 


Flag 


Meaning 


R_ABS 





Reference is absolute; no relocation is 
necessary. The entry will be ignored. 


R_DIR32 


06 


Direct 32-bit reference to the symbol's 
virtual address. 


R_DIR32S 


012 


Direct 32-bit reference to the symbol's 
virtual address, with the 32-bit value 
stored in the reverse order in the 
object file. 



Figure 11-12: Relocation Types (3B2, 3B5, 3B15 Computers) 



Relocation Entry Declaration 

The structure declaration for relocation entries is given in Figure 11-13. 
This declaration may be found in the header file reloc.h. 
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stxuct reloc 




long 



r_vaddr; 



/* virtual address of reference */ 



long 



r_syinndx; 



/* index into symbol table ♦/ 



unsigned short r type; 



/* relocation type ♦/ 



}; 



#clef iiie RELOC 



Struct reloc 



imef ine RELSZ 



10 





Figure 11-13: Relocation Entry Declaration 



Line Numbers 

When invoked with the -g option, the cc, and ill commands cause an 
entry in the object file for every source line where a breakpoint can be 
inserted. You can then reference line numbers when using a software 
debugger like sdb. All line numbers in a section are grouped by function 
as shown in Figure 11-14. 
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symbol index 





physical address 


line number 


physical address 


line number 


• 




symbol index 





physical address 


line number 


physical address 


line number 



Figure 11-14: Line Number Grouping 



The first entry in a function grouping has line number and has, in 
place of the physical address, an index into the symbol table for the entry 
containing the function name. Subsequent entries have actual line numbers 
and addresses of the text corresponding to the line numbers. The line 
number entries are relative to the beginning of the function, and appear in 
increasing order of address. 

Line Number Declaration 

The structure declaration currently used for line number entries is given 
in Figure 11-15. 
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struct lineno 
{ 

union 
{ 

long l^symntax; /♦ syratbl index of func name */ 
long 1 paddr; /* paddr of line number */ 

} l^addr; 

imsigned short l_lnno; /* line number */ 

}; 

#def ine LINEND struct lineno 
#def ine LINESZ 6 



Figure 11-15: Line Number Entry Declaration 



Symbol Table 

Because of symbolic debugging requirements, the order of symbols in 
the symbol table is very important. Symbols appear in the sequence shown 
in Figure 11-16. 
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filename 1 
function 1 
local symbols 
for function 1 

function 2 
local symbols 
for function 2 



statics 



filename 2 
function 1 
local symbols 
for function 1 



statics 



defined global 
symbols 
undefined global 
symbols 

Figure 11-16: COFF Symbol Table 



The word statics in Figure 11-16 means symbols defined with the C 
language storage class static outside any function. The symbol table consists 
of at least one fixed-length entry per symbol with some symbols followed 
by auxiliary entries of the same size. The entry for each symbol is a struc- 
ture that holds the value, the type, and other information. 

Special Symbols 

The symbol table contains some special symbols that are generated by 
as, and other tools. These symbols are given in Figure 11-17. 
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Symbol 


Meaning 


.file 


filename 


.text 


address of .text section 


.data 


address of .data section 


.bss 


address of .bss section 


.bb 


address of start of inner block 


.eb 


address of end of inner block 


.bf 


address of start of function 


.ef 


address of end of function 


.target 


pointer to the structure or union 
returned by a function 


.xfake 


dummy tag name for structure, union, 
or enumeration 


.eos 


end of members of structure, union, or 
enumeration 


etext 


next available address after the end of 
the output section .text 


edata 


next available address after the end of 
the output section .data 


end 


next available address after the end of 
the output section .bss 



Figure 11-17: Special Symbols in the Symbol Table 



Six of these special symbols occur in pairs. The .bb and ,eb symbols 
indicate the boundaries of inner blocks; a .bf and .ef pair brackets each 
function. An .xfake and .eos pair names and defines the limit of structures, 
unions, and enumerations that were not named. The .eos symbol also 
appears after named structures, unions, and enumerations. 

When a structure, union, or enumeration has no tag name, the compiler 
invents a name to be used in the symbol table. The name chosen for the 
symbol table is .xfake, where x is an integer. If there are three unnamed 
structures, unions, or enumerations in the source, their tag names are 
.Ofake, .Ifake, and .2fake. Each of the special symbols has different infor- 
mation stored in the symbol table entry as well as the auxiliary entries. 
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Inner Blocks 

The C language defines a block as a compound statement that begins 
and ends with braces, I and }. An inner block is a block that occurs within 
a function (which is also a block). 

For each inner block that has local symbols defined, a special symbol, 
•bb, is put in the symbol table immediately before the first local symbol of 
that block. Also a special symbol, .eb, is put in the symbol table immedi- 
ately after the last local symbol of that block. The sequence is shown in 
Figure 11-18. 

^bb 

local symbols 
for that block 
.eb 

Figure 11-18: Special Symbols (.bb and .eb) 



Because inner blocks can be nested by several levels, the .bb-.eb pairs 
and associated symbols may also be nested. See Figure 11-19. 
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/* block 1 ♦/ 

int i; 
char c; 

{ /* block 2 */ 

long a; 

{ /* blocdc 3 */ 

int x; 

} /♦ block 3 */ 

} /♦ block 2 ♦/ 

{ /♦ block 4 ♦/ 

long i; 

} /* block 4 */ 

/* block 1 ♦/ 



Figure 11-19: Nested blocks 



The symbol table would look like Figure 11-20. 
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.bb for block 1 



,bb for block 2 

a 

.bb for block 3 

X 

,eb for block 3 
.eb for block 2 
.bb for block 4 
i 

.eb for block 4 
.eb for block 1 



Figure 11-20: Example of the Symbol Table 



Symbols and Functions 

For each function, a special symbol .bf is put between the function 
name and the first local symbol of the function in the symbol table. Also, a 
special symbol .ef is put immediately after the last local symbol of the func- 
tion in the symbol table. The sequence is shown in Figure 11-21. 



function name 
.bf 

local symbol 
.ef 



Figure 11-21: Symbols for Functions 
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Symbol Table Entries 

All symbols, regardless of storage class and type, have the same format 
for their entries in the symbol table. The symbol table entries each contain 
18 bytes of information. The meaning of each of the fields in the symbol 
table entry is described in Figure 11-22. It should be noted that indices for 
symbol table entries begin at and count upward. Each auxiliary entry also 
counts as one symbol. 



Bytes 


Declaration 


Name 


Description 


0-7 


(see text below) 


_n 


These 8 bytes contain either a 
symbol name or an index to a 
symbol 


8-11 


long int 


nvalue 


Symbol value; storage class 
dependent 


12-13 


short 


nscnum 


Section number of symbol 


14-15 


unsigned short 


njtype 


Basic and derived type 
specification 


16 


char 


n_sclass 


Storage class of symbol 


17 


char 


njiumaux 


Number of auxiliary entries 



Figure 11-22: Symbol Table Entry Format 



Symbol Names 

The first 8 bytes in the symbol table entry are a union of a character 
array and two longs. If the symbol name is eight characters or less, the 
(null-padded) symbol name is stored there. If the symbol name is longer 
than eight characters, then the entire symbol name is stored in the string 
table. In this case, the 8 bytes contain two long integers, the first is zero, 
and the second is the offset (relative to the beginning of the string table) of 
the name in the string table. Since there can be no symbols with a null 
name, the zeroes on the first 4 bytes serve to distinguish a symbol table 
entry with an offset from one with a name in the first 8 bytes as shown in 
Figure 11-23. 
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Dy icb 


L^ecxaratioii 


Name 


Description 


0-7 


char 


nname 


8-character null-padded sym- 


0-3 


long 


n_zeroes 


Zero in this field indicates the 
name is in the string table 


4-7 


long 


noffset 


Offset of the name in the 
string table 



Figure 11-23: Name Field 



Special symbols generated by the C Compilation System are discussed 
above in "Special Symbols." 

Storage Classes 

The storage class field has one of the values described in Figure 11-24. 
These #define's may be found in the header file storclass.h. 



Mnemonic 


Value 


Storage Class 


C_EFCN 


-1 


physical end of a function 


C_NULL 







C_AUTO 


1 


automatic variable 


C_EXT 


2 


external symbol 


C_STAT 


3 


static 


C_REG 


4 


register variable 


C_EXTDEF 


5 


external definition 


C_LABEL 


6 


label 


C_ULABEL 


7 


undefined label 


C_MOS 


8 


member of structure 


C_ARG 


9 


function argument 


C_STRTAG 


10 


structure tag 


C_MOU 


11 


member of union 


C_UNTAG 


12 


union tag 


C_TPDEF 


13 


type definition 
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Mnemonic 


Value 


Storage Class 


C^USTATIC 


14 


uninitialized static 


C_ENTAG 


15 


enumeration tag 


C_MOE 


16 


member of enumeration 


C_REGPARM 


17 


register parameter 


C_FIELD 


18 


bit field 


C_BLOCK 


100 


beginning and end of block 


C_FCN 


101 


beginning and end of function 


C_EOS 


102 


end of structure 


C_FILE 


103 


filename 


CELINE 


104 


used only by utility programs 


C_ALIAS 


105 


duplicated tag 


C_HIDDEN 


106 


like static, used to avoid name conflicts 



Figure 11-24: Storage Classes 



All of these storage classes except for C_ALIAS and C_HIDDEN are gen- 
erated by the cc or as commands. The compress utility, cprs, generates the 
C^ALIAS mnemonic. This utility (described in the User's Reference Manual) 
removes duplicated structure, union, and enumeration definitions and puts 
alias entries in their places. The storage class C_HIDDEN is not used by 
any UNIX system tools. 

Some of these storage classes are used only internally by the C Compila- 
tion Systems. These storage classes are C_EFCN, C^EXTDEF, C ULABEL, 
C_USTATIC, and C_LINE. 

Storage Classes for Special Symbols 

Some special symbols are restricted to certain storage classes. They are 
given in Figure 11-25. 
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Special Symbol 


Storage Class 


.file 


C_FILE 


•bb 


C_BLOCK 


.eb 


C_BLOCK 


.bf 


C_FCN 


.ef 


C_FCN 


.target 


C_AUTO 


xfake 


C_STRTAG, C_UNTAG, C_ENTAG 


.eos 


C_EOS 


.text 


C_STAT 


.data 


C_STAT 


.bss 


C_STAT 



Figure 11-25: Storage Class by Special Symbols 



Also some storage classes are used only for certain special symbols. 
They are summarized in Figure 11-26. 



Storage Class 


Special Symbol 


C_BLOCK 


.bb, .eb 


C_FCN 


.bf, .ef 


C_EOS 


.eos 


C_FILE 


.file 



Figure 11-26: Restricted Storage Classes 



Symbol Value Field 

The meaning of the value of a symbol depends on its storage class. This 
relationship is summarized in Figure 11-27. 
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Storage Class 


Meaning of Value 


C_AUTO 


stack offset in bytes 


C_EXT 


relocatable address 


C_STAT 


relocatable address 


C_REG 


register number 


C_LABEL 


relocatable address 


C_MOS 


offset in bytes 


C_ARG 


stack offset in bytes 


C_STRTAG 





C_MOU 





C_UNTAG 





C_TPDEF 





C_ENTAG 





C_MOE 


enumeration value 


C_REGPARM 


register number 


C_FIELD 


bit displacement 


C_BLOCK 


relocatable address 


C_FCN 


relocatable address 


C_EOS 


size 


C_FILE 


(see text below) 


C_ALIAS 


tag index 


C_HIDDEN 


relocatable address 



Figure 11-27: Storage Class and Value 



If a symbol has storage class C_FILE, the value of that symbol equals the 
symbol table entry index of the next .file symbol. That is, the .file entries 
form a one-way linked list in the symbol table. If there are no more .file 
entries in the symbol table, the value of the symbol is the index of the first 
global symbol. 

Relocatable symbols have a value equal to the virtual address of that 
symbol. When the section is relocated by the link editor, the value of these 
symbols changes. 
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Section Number Field 

Section numbers are listed in Figure 11-28. 



Mnemonic 


Section Number 


Meaning 


N_DEBUG 


-2 


Special symbolic debugging 
symbol 


N^BS 


-1 


Absolute symbol 


N_UNDEF 





Undefined external symbol 


N_SCNUM 


1-^77777 


Section number where symbol 
is defined 



Figure 11-28: Section Number 



A special section number (—2) marks symbolic debugging symbols, 
including structure /union /enumeration tag names, typedefs, and the name 
of the file. A section number of -1 indicates that the symbol has a value 
but is not relocatable. Examples of absolute-valued symbols include 
automatic and register variables, function arguments, and .eos symbols. 

With one exception, a section number of indicates a relocatable exter- 
nal symbol that is not defined in the current file. The one exception is a 
multiply defined external symbol (i.e., FORTRAN common or an uninitial- 
ized variable defined external to a function in C), In the symbol table of 
each file where the symbol is defined, the section number of the symbol is 
and the value of the symbol is a positive number giving the size of the sym- 
bol. When the files are combined to form an executable object file, the link 
editor combines all the input symbols of the same name into one symbol 
with the section number of the .bss section. The maximum size of all the 
input symbols with the same name is used to allocate space for the symbol 
and the value becomes the address of the symbol. This is the only case 
where a symbol has a section number of and a non-zero value. 

Section Numbers and Storage Classes 

Symbols having certain storage classes are also restricted to certain sec- 
tion numbers. They are summarized in Figure 11-29. 
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Storage Class 


Section Number 


C_AUTO 


N_ABS 


C_EXT 


N_ABS, N_UNDEF, N_SCNUM 


C_STAT 


N_SCNUM 


C_REG 


N_ABS 


C_LABEL 


N_UNDEF, N_SCNUM 


C_MOS 


N_ABS 


C_ARG 


N_ABS 


C_STRTAG 


N_DEBUG 


C_MOU 


N_ABS 


C_UNTAG 


N_DEBUG 


C_TPDEF 


N_DEBUG 


C_ENTAG 


N_DEBUG 


C_MOE 


N_ABS 


C_REGPARM 


N_ABS 


C_FIELD 


N_ABS 


C_BLOCK 


N_SCNUM 


C_FCN 


N_SCNUM 


C_EOS 


N_ABS 


C_FILE 


N_DEBUG 


C_ALIAS 


N_DEBUG 



Figure 11-29: Section Number and Storage Class 



Type Entry 

The type field in the symbol table entry contains information about the 
basic and derived type for the symbol. This information is generated by the 
C Compilation System only if the -g option is used. Each symbol has 
exactly one basic or fundamental type but can have more than one derived 
type. The format of the 16-bit type entry is 
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d6 


d5 


d4 


d3 


d2 


dl 


typ 



Bits through 3, called typ, indicate one of the fundamental types 
given in Figure 11-30. 



Mnemonic 


Value 


Type 


T_NULL 





type not assigned 


T_VOID 


1 


void 


T_CHAR 


2 


character 


T_SHORT 


3 


short integer 


T_INT 


4 


integer 


T_LONG 


5 


long integer 


T_FLOAT 


6 


floating point 


T_DOUBLE 


7 


double word 


T_STRUCT 


8 


structure 


T_UNION 


9 


union 


T_ENUM 


10 


enumeration 


T_MOE 


11 


member of enumeration 


T_UCHAR 


12 


unsigned character 


T_USHORT 


13 


unsigned short 


T_UINT 


14 


unsigned integer 


T_ULONG 


15 


unsigned long 



Figure 11-30: Fundamental Types 



Bits 4 through 15 are arranged as six 2-bit fields marked dl through d6. 
These d fields represent levels of the derived types given in Figure 11-31. 
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Mnemonic 


Value 


Type 


DT_NON 





no derived type 


DT_PTR 


1 


pointer 


DT_FCN 


2 


function 


DT_ARY 


3 


array 



Figure 11-31: Derived Types 



The following examples demonstrate the interpretation of the symbol 
table entry representing type. 

char ^tfunc( ) ; 

Here func is the name of a function that returns a pointer to a character. 
The fundamental type of func is 2 (character), the dl field is 2 (function), 
and the d2 field is 1 (pointer). Therefore, the type word in the symbol table 
for func contains the hexadecimal number 0x62, which is interpreted to 
mean a function that returns a pointer to a character. 

short *tabptx[10][25][3]; 

Here tabptr is a three-dimensional array of pointers to short integers. 
The fundamental type of tabptr is 3 (short integer); the dl, d2, and d3 fields 
each contains a 3 (array), and the d4 field is 1 (pointer). Therefore, the type 
entry in the symbol table contains the hexadecimal number 0x7f3 indicating 
a three-dimensional array of pointers to short integers. 

Type Entries and Storage Classes 

Figure 11-32 shows the type entries that are legal for each storage class. 
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Diorage 
Class 


d Entry 


ryp xiniry 
Basic Type 


Function? 


Array? 


Pointer? 


C_AUTO 


no 


yes 


yes 


Any except T_MOE 


C_EXT 


yes 


yes 


yes 


Any except T_MOE 


C_STAT 


yes 


yes 


yes 


Any except T_MOE 


C REG 


no 


no 


yes 


Any except T MOE 


C LABEL 


no 


no 


no 


T_NULL 


C MOS 


no 


yes 


yes 


Any except T MOE 


Cj\RG 


yes 


no 


yes 


Any except T MOE 




no 


no 


no 






no 


yes 


yes 


x\xvy excepi i iviv^c 


C__U IN i J\kj 


no 


no 


no 


1 _U IN IL^IN 


C_TPDEF 


no 


yes 


yes 


Any except T_MOE 


C_ENTAG 


no 


no 


no 


T^ENUM 


C_MOE 


no 


no 


no 


T_^MOE 


C_REGPARM 


no 


no 


yes 


Any except T_MOE 


C_FIELD 


no 


no 


no 


T ENUM, T UCHAR, 
T^USHORT, T_UNIT, 
TJJLONG 


C_BLOCK 


no 


no 


no 


T_NULL 


C_FCN 


no 


no 


no 


T_NULL 


C_EOS 


no 


no 


no 


T_NULL 


C_FILE 


no 


no 


no 


T_NULL 


C_ALIAS 


no 


no 


no 


T STRUCT, T UNION, 
T_ENUM 



Figure 11-32: Type Entries by Storage Class 



1 1-32 PROGRAMMER'S GUIDE 



The Common Object File Format (COFF) 



Conditions for the d entries apply to dl through d6, except that it is 
impossible to have two consecutive derived types of function. 

Although function arguments can be declared as arrays, they are 
changed to pointers by default. Therefore, no function argument can have 
array as its first derived type. 

Structure for Symbol Table Entries 

The C language structure declaration for the symbol table entry is given 
in Figure 11-33. This declaration may be found in the header file syms.h. 
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struct syment 
{ 



union 
{ 



char 

struct 

{ 



} _n_n; 
char 

} _n; 

unsigned loong 
short 

unsigned short 

char 

char 

>; 

#def ine n name 
#def ine n zeroes 
#define n_offset 
#define njaptr 

#define SYMNKLEN 
#definfi SXMESZ 



^n_naiie[SXMtmLEN]; /* synibol name*/ 

long _n_zeroes; /* synibol name */ 

long _n^offset; /# location in string table */ 

♦_n_i5jtr[2]; A allows overlaying */ 
n_value; A value of symbol */ 

n_scnum; A section number ♦/ 

n_type; A type and derived */ 

n_sclass; A storage class */ 

n nuraaux; A number of aux entries ♦/ 



_n. _n_name 
_n._n_n._n_ 
_n. _n_n, _n_of f set 
_n._n^rptrt1] 

8 

18 A size of a symbol table entry */ 



Figure 1 1-33: Symbol Table Entry Declaration 
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Auxiliary Table Entries 

An auxiliary table entry of a symbol contains the same number of bytes 
as the symbol table entry. However, unlike symbol table entries, the format 
of an auxiliary table entry of a symbol depends on its type and storage class. 
They are summarized in Figure 11-34. 





Storage 
Class 


Type Entry 


Auxiliary 
Entry Format 


Name 


J 1 
dl 


typ 


.file 


C FILE 


DT_NON 


T_NULL 


filename 


.text/.data, 
.bss 


C_STAT 


DT_NON 


i-p X TT TT T 

T_NULL 


section 


tagname 


C STRTAG 
C_UNTAG 
C_ENTAG 


DT MOM 


T MTJT T 


tag name 


.eos 


C_EOS 


DT_NON 


T_NULL 


end of .itruc- 
ture 


fcname 


C EXT 
C_STAT 


DT_FCN 


(Note 1) 


function 


arrname 


(Note 2) 


DT_ARY 


(Note 1) 


array 


.bb,.eb 


C_BLOCK 


DT_NON 


T_NULL 


beginning and 
end of block 


.bf,.ef 


C_FCN 


DT_NON 


T_NULL 


beginning and 
end of function 


name related 
to structure, 
union, 
enumeration 


(Note 2) 


DT PTR, 
DT ARR, 
DT NON 


T STRUCT, 
T UNION, 
T ENUM 


name related 
to structure, 
union, 
enumeration 



Figure 11-34: Auxiliary Symbol Table Entries 



Notes to Figure 11-34: 

1. Any except T_MOE. 

2. C_AUTO, C_STAT, C_MOS, C_MOU, C_TPDEF. 
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In Figure 11-34, tagname means any symbol name including the special 
symbol .xfake, and fcname and arrname represent any symbol name for a 
function or an array respectively. Any symbol that satisfies more than one 
condition in Figure 11-34 should have a union format in its auxiliary entry. 



NOTE 



It is a mistake to assume how many auxiliary entries are associated with 
any given symbol table entry. This information is available, and should 
be obtained from the n_numaux field in the symbol table. 



Filenames 

Each of the auxiliary table entries for a filename contains a 14-character 
filename in bytes through 13. The remaining bytes are 0. 

Sections 

The auxiliary table entries for sections have the format as shown in Fig- 
ure 11-35. 



Bytes 


Declaration 


Name 


Description 


0-3 


long int 


x_scnlen 


section length 


4-5 


unsigned short 


xjnreloc 


number of relocation 
entries 


6-7 


unsigned short 


x_nlinno 


number of line numbers 


8-17 






unused (filled with zeroes) 



Figure 11-35: Format for Auxiliary Table Entries for Sections 



Tag Names 

The auxiliary table entries for tag names have the format shown in Fig- 
ure 11-36. 
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Bytes 


Declaration 


Name 


Description 


0-5 






unused (filled with zeroes) 


6-7 


unsigned short 


xjsize 


size of structure, union, and 
enumeration 


8-11 








unused (filled with zeroes) 


12-15 


long int 


x_endndx 


index of next entry beyond 
this structure, union, or 
enumeration 


16-17 






unused (filled with zeroes) 



Figure 11-36: Tag Names Table Entries 



End of Structures 

The auxiliary table entries for the end of structures have the format 
shown in Figure 11-37: 



Bytes 


Declaration 


Name 


Description 


0-3 


long int 


xtagndx 


tag index 


4-5 






unused (filled with zeroes) 


6-7 


unsigned short 


x_size 


size of structure, union, or 
enumeration 


8-17 






unused (filled with zeroes) 



Figure 1 1-37: Table Entries for End of Structures 



Functions 

The auxiliary table entries for functions have the format shown in Fig- 
ure 11-38: 
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Bytes 


Declaration 


Name 


Description 


0-3 


long int 


x_tagndx 


tag index 


4-7 


long int 


xfsize 


size of function (in bytes) 


8-11 


long int 


xjnnoptr 


file pointer to line number 


12-15 


long int 


xjendndx 


index of next entry beyond 
this point 


16-17 


unsigned short 


x_tvndx 


index of the function's address 
in the transfer vector table 
(not used in UNIX system) 



Figure 11-38: Table Entries for Functions 



Arrays 

The auxiliary table entries for arrays have the format shown in Figure 
11-39. Defining arrays having more than four dimensions produces a warn- 
ing message. 



Bytes 


Declaration 


Name 


Description 


0-3 


long int 


xjagndx 


tag index 


4-5 


unsigned short 


x_lnno 


line number of declaration 


6-7 


unsigned short 


x_size 


size of array 


8-9 


unsigned short 


x_dimen[0] 


first dimension 


10-11 


unsigned short 


x_dimen[l] 


second dimension 


12-13 


unsigned short 


x_dimen[2] 


third dimension 


14-15 


unsigned short 


x_dimen[3] 


fourth dimension 


16-17 






unused (filled with zeroes) 



Figure 11-39: Table Entries for Arrays 
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End of Blocks and Functions 

The auxiliary table entries for the end of blocks and functions have the 
format shown in Figure 11-40: 



Bytes 


Declaration 


Name 


Description 


0-3 






unused (filled with zeroes) 


4-5 


unsigned short 


xjnno 


C-source line number 


6-17 






unused (filled with zeroes) 



Figure 11-40: End of Block and Function Entries 



Beginning of Bloclcs and Functions 

The auxiliary table entries for the beginning of blocks and functions 
have the format shown in Figure 11-41: 



Bytes 


Declaration 


Name 


Description 


0-3 






unused (filled with zeroes) 


4-5 


unsigned short 


x_lnno 


C-source line number 


6-11 






unused (filled with zeroes) 


12-15 


long int 


x_endndx 


index of next entry past this 






block 


16-17 






unused (filled with zeroes) 



Figure 11-41: Format for Beginning of Block and Function 



Names Related to Structures, Unions, and Enumerations 

The auxiliary table entries for structure, union, and enumeration sym- 
bols have the format shown in Figure 11-42: 
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Bytes 


Declaration 


Name 


Description 


0-3 


long int 


x_tagndx 


tag index 


4-5 






unused (filled with zeroes) 


6-7 


unsigned short 


xsize 


size of the structure, union, or 
enumeration 


8-17 






unused (filled with zeroes) 



Figure 11-42: Entries for Structures, Unions, and Enumerations 



Aggregates defined by typedef may or may not have auxiliary table 
entries. For example. 



typedef struct people STODEOT; 

stjnict people 
{ 

char nanie[20]; 
long id; 

}; 

typedef struct people EMPLCKEE; 



The symbol EMPLOYEE has an auxiliary table entry in the symbol table 
but symbol STUDENT will not because it is a forward reference to a struc- 
ture. 

Auxiliary Entry Declaration 

The C language structure declaration for an auxiliary symbol table entry 
is given in Figure 11-43. This declaration may be found in the header file 
syms.h. 
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xmicm auxent 




struct 



long xtagnobc; 
{ 

struct 
{ 

unsigned short x lnno; 
unsigned short x size; 
} x^lnsz; 
long x_fsize; 
} x_misc; 
union 
{ 

struct 



unsigned short x_dimen[DIMNUM] ; 



} X ary; 
} x fcnary; 

unsigned short x_tvrKax; 
} x sym; 
struct 
{ 

char x_fnaine[FIIlWLEN] ; 
} x file; 
struct 
{ 

long x_scnlen; 
unsigned short x nreloc; 
unsigned short x_nlinix5; 
} X sen; 



{ 



long 
long 



x_liincptr; 
x_endndx; 



} x_fcn; 

struct 

{ 
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continued 




struct 
{ 

long xtvfill; 
imsigned shcsrt x_tvlen; 
unsigned short x_tvranC2]; 
} X tv; 

} 

#de£±ne FII1«LEN 14 
#def ine DIHNUM 4 
#defijte AUXElfT union auxent 
#de£ine AUXESZ 18 



Symbol table names longer than eight characters are stored contiguously 
in the string table with each symbol name delimited by a null byte. The 
first four bytes of the string table are the size of the string table in bytes; 
offsets into the string table, therefore, are greater than or equal to 4. For 
example, given a file containing two symbols (with names longer then eight 
characters, long_name_l and another_one) the string table has the format as 
shown in Figure 11-44: 
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String Table 
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'1' 

L 


u 


II 


'cr' 
B 


/ / 


II 


'a' 

d 


III 






1 


An' 


'a' 


'n' 


'o' 


't' 


'h' 


'e' 


'r' 




'o' 


'n' 


'e' 


AO' 



Figure 11-44: String Table 



The index of long_name_l in the string table is 4 and the index of 
another one is 16. 



Access Routines 

UNIX system releases contain a set of access routines that are used for 
reading the various parts of a common object file. Although the calling pro- 
gram must know the detailed structure of the parts of the object file it 
processes, the routines effectively insulate the calling program from the 
knowledge of the overall structure of the object file. 

The access routines can be divided into four categories: 

1 . functions that open or close an object file 

2. functions that read header or symbol table information 

3. functions that position an object file at the start of a particular sec- 
tion of the object file 

4. a function that returns the symbol table index for a particular sym- 
bol 
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These routines can be found in the hbrary libld.a and are listed in Sec- 
tion 3 of the Programmer's Reference Manual. A summary of what is available 
can be found in the Programmer's Reference Manual under Idfcn(4). 
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In Chapter 2 there was a discussion of link editor command line options 
(some of which may also be provided on the cc(l) command line). This 
chapter contains information on the Link Editor Command Language. 

The command language enables you to 

■ specify the memory configuration of the target machine 

■ combine the sections of an object file in arrangements other than the 
default 

■ bind sections to specific addresses or within specific portions of 
memory 

■ define or redefine global symbols 

Under most normal circumstances there is no compelling need to have 
such tight control over object files and where they are located in memory. 
When you do need to be very precise in controlling the link editor output, 
you do it by means of the command language. 

Link editor command language directives are passed in a file named on 
the ld(l) command line. Any file named on the command line that is not 
identifiable as an object module or an archive library is assumed to contain 
directives. The following paragraphs define terms and describe conditions 
with which you need to be familiar before you begin to use the command 
language. 



Memory Configuration 

The virtual memory of the target machine is, for purposes of allocation, 
partitioned into configured and unconfigured memory. The default condi- 
tion is to treat all memory as configured. It is common with microprocessor 
applications, however, to have different types of memory at different 
addresses. For example, an application might have 3K of PROM (Pro- 
grammable Read-Only Memory) beginning at address 0, and 8K of ROM 
(Read-Only Memory) starting at 20K. Addresses in the range 3K to 20K— 1 
are then not configured. Unconfigured memory is treated as reserved or 
unusable by ld(l)- Nothing can ever be linked into unconfigured memory. 
Thus, specifying a certain memory range to be unconfigured is one way of 
marking the addresses (in that range) illegal or nonexistent with respect to 
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the linking process. Memory configurations other than the default must be 
explicitly specified by you (the user). 

Unless otherwise specified, all discussion in this document of memory, 
addresses, etc. are with respect to the configured sections of the address 
space. 



Sections 

A section of an object file is the smallest unit of relocation and must be 
a contiguous block of memory. A section is identified by a starting address 
and a size. Information describing all the sections in a file is stored in sec- 
tion headers at the start of the file. Sections from input files are combined 
to form output sections that contain executable text, data, or a mixture of 
both. Although there may be holes or gaps between input sections and 
between output sections, storage is allocated contiguously within each out- 
put section and may not overlap a hole in memory. 



Addresses 

The physical address of a section or symbol is the relative offset from 
address zero of the address space. The physical address of an object is not 
necessarily the location at which it is placed when the process is executed. 
For example, on a system with paging, the address is with respect to address 
zero of the virtual space, and the system performs another address transla- 
tion. 



Binding 

It is often necessary to have a section begin at a specific, predefined 
address in the address space. The process of specifying this starting address 
is called binding, and the section in question is said to be bound to or 
bound at the required address. While binding is most commonly relevant 
to output sections, it is also possible to bind special absolute global symbols 
with an assignment statement in the ld(l) command language. 
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Object File 

Object files are produced both by the assembler (typically as a result of 
calling the compiler) and by ld(l). ld(l) accepts relocatable object files as 
input and produces an output object file that may or may not be relocatable. 
Under certain special circumstances, the input object files given to ld(l) can 
also be absolute files. 

Files produced from the compilation system may contain, among others, 
sections called .text and .data. The -text section contains the instruction 
text (executable instructions), .data contains initialized data variables. For 
example, if a C program contained the global (i.e., not inside a function) 
declaration 

int i = 100; 

and the assignment 

i = 0; 

then compiled code from the C assignment is stored in .text, and the vari- 
able i is located in .data. 
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Expressions 

Expressions may contain global symbols, constants, and most of the 
basic C language operators. (See Figure 12-2, "Syntax Diagram for Input 
Directives.") Constants are as in C with a number recognized as decimal 
unless preceded with for octal or Ox for hexadecimal. All numbers are 
treated as long integers's. Symbol names may contain uppercase or lower- 
case letters, digits, and the underscore, _. Symbols within an expression 
have the value of the address of the symbol only. ld(l) does not do symbol 
table lookup to find the contents of a symbol, the dimensionality of an 
array, structure elements declared in a C program, etc. 

ld(l) uses a lex-generated input scanner to identify symbols, numbers, 
operators, etc. The current scanner design makes the following names 
reserved and unavailable as symbol names or section names: 



ADDR 
ALIGN 
ASSIGN 
BIND 



BLOCK 
COMMON 
COPY 
DSECT 



addr 
align 
assign 
bind 



GROUP 
INFO 
LENGTH 
MEMORY 



NEXT 
NOLOAD 
ORIGIN 
OVERLAY 



RANGE 
REGIONS 
SECTIONS 
SIZEOF 



sizeof 
spare 



SPARE 

PHY 

TV 



block length origin 

group next phy 

I o range 

len org s 



The operators that are supported, in order of precedence from high to 
low, are shown in Figure 12-1: 
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symbol 




(UNARY Minus) 


* / 


% 


4. 

1 




>> 


<< 




=><<=>= 


& 




1 
1 









Figure 12-1: Operator Symbols 



The above operators have the same meaning as in the C language. Opera- 
tors on the same line have the same precedence. 

Assignment Statements 

External symbols may be defined and assigned addresses via the assign- 
ment statement. The syntax of the assignment statement is 

symbol = expressicn; 

or 

symbol pp= eaqaression; 

where op is one of the operators +, *, or /. Assignment statements must 
be terminated by a semicolon. 

All assignment statements (with the exception of the one case described 
in the following paragraph) are evaluated after allocation has been per- 
formed. This occurs after all input-file-defined symbols are appropriately 
relocated but before the actual relocation of the text and data itself. There- 
fore, if an assignment statement expression contains any symbol name, the 
address used for that symbol in the evaluation of the expression reflects the 
symbol address in the output object file. References within text and data (to 
symbols given a value through an assignment statement) access this latest 
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assigned value. Assignment statements are processed in the same order in 
which they are input to ld(l). 

Assignment statements are normally placed outside the scope of 
section-definition directives (see "Section Definition Directives" under "Link 
Editor Command Language"). However, there exists a special symbol, called 
dot, . , that can occur only within a section-definition directive. This symbol 
refers to the current address of ld(l)'s location counter. Thus, assignment 
expressions involving . are evaluated during the allocation phase of ld(l). 
Assigning a value to the . symbol within a section-definition directive can 
increment (but not decrement) ld(l)'s location counter and can create holes 
within the section, as described in "Section Definition Directives." Assign- 
ing the value of the . symbol to a conventional symbol permits the final 
allocated address (of a particular point within the link edit run) to be saved. 

align is provided as a shorthand notation to allow alignment of a sym- 
bol to an n-byte boundary within an output section, where n is a power of 
2. For example, the expression 

align(n) 

is equivalent to 

(. + n - 1) S.^(n - 1) 

SIZEOF and ADDR are pseudo-functions that, given the name of a sec- 
tion, return the size or address of the section respectively. They may be 
used in symbol definitions outside of section directives. 

Link editor expressions may have either an absolute or a relocatable 
value. When ld(l) creates a symbol through an assignment statement, the 
symbol's value takes on that type of expression. That type depends on the 
following rules: 

■ An expression with a single relocatable symbol (and zero or more 
constants or absolute symbols) is relocatable. 

■ The difference of two relocatable symbols from the same section is 
absolute. 

■ All other expressions are combinations of the above. 
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Specifying a IMemory Configuration 

MEMORY directives are used to specify 

1 . The total size of the virtual space of the target machine. 

2. The configured and unconfigured areas of the virtual space. 

If no directives are supplied, ld(l) assumes that all memory is configured. 
The size of the default memory is dependent upon the target machine. 

By means of MEMORY directives, an arbitrary name of up to eight char- 
acters is assigned to a virtual address range. Output sections can then be 
forced to be bound to virtual addresses w^ithin specifically named memory 
areas. Memory names may contain uppercase or lowercase letters, digits, 
and the special characters $, ., or _. Names of memory ranges are used by 
ld(l) only and are not carried in the output file symbol table or headers. 

When MEMORY directives are used, all virtual memory not described in 
a MEMORY directive is considered to be unconfigured. Unconfigured 
memory is not used in Id(l)'s allocation process; hence nothing except 
DSECT sections can be link edited or bound to an address within 
unconfigured memory. 

As an option on the MEMORY directive, attributes may be associated 
with a named memory area. In future releases this may be used to provide 
error checking. Currently, error checking of this type is not implemented. 

The attributes currently accepted are 

1 . R : readable memory 

2. W : writable memory 

3. X : executable, i.e., instructions may reside in this memory 

4. I : initializable, i.e., stack areas are typically not initialized 

Other attributes may be added in the future if necessary. If no attributes 
are specified on a MEMORY directive or if no MEMORY directives are sup- 
plied, memory areas assume the attributes of R, W, X, and I. 
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The syntax of the MEMORY directive is 




{ 

nanel (attr) : carf.gin = n1, length = n2 
Jiaine2 (attr) : origin = n3, length = n4 
etc, 

) 




The keyword origin (or org or o) must precede the origin of a memory 
range, and length (or len or 1) must precede the length as shown in the 
above prototype. The origin operand refers to the virtual address of the 
memory range, origin and length are entered as long integer constants in 
either decimal, octal, or hexadecimal (standard C syntax), origin and length 
specifications, as well as individual MEMORY directives, may be separated 
by white space or a comma. 

By specifying MEMORY directives, ld(l) can be told that memory is 
configured in some manner other than the default. For example, if it is 
necessary to prevent anything from being linked to the first 0x10000 words 
of memory, a MEMORY directive can accomplish this. 




{ 



valid : org = 0x10000, len = OxEEOOOO 

} 
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Section Definition Directives 

The purpose of the SECTIONS directive is to describe how input sec- 
tions are to be combined, to direct where to place output sections (both in 
relation to each other and to the entire virtual memory space), and to per- 
mit the renaming of output sections. 

In the default case where no SECTIONS directives are given, ail input 
sections of the same name appear in an output section of that name. If two 
object files are linked, one containing sections si and s2 and the other con- 
taining sections s3 and s4, the output object file contains the four sections 
si, s2, s3, and s4. The order of these sections would depend on the order in 
which the link editor sees the input files. 

The basic syntax of the SECTIONS directive is 



f 

SECTIONS 
{ 

secnamel : 
{ 

f ile_specif ications , 
assignment statanents 

) 

secnaiiie2 : 
{ 

£ile_speci£icatiQns , 
assignnent statanents 

} 

etc. 
} 

V 

The various types of section definition directives are discussed in the 
remainder of this section. 
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File Specifications 

Within a section definition, the files and sections of files to be included 
in the output section are listed in the order in which they are to appear in 
the output section. Sections from an input file are specified by 

filename ( secname ) 

or 

filename ( secnami seGnam2 . . . ) 

Sections of an input file are separated either by white space or commas as 
are the file specifications themselves. 

filename 

may be used in the same way to refer to all the uninitialized, unallocated 
global symbols in a file. 

If a file name appears with no sections listed, then all sections from the 
file (but not the uninitialized, unallocated globals) are linked into the 
current output section. For example. 




{ 

Gutsecl: 
{ 

filel.o (secD 
file2.o 

fileS.o (seel, sec2) 

} 




According to this directive, the order in which the input sections appear in 
the output section outsecl would be 
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1 . section seel from file filel.o 

2. all sections from file2.o, in the order they appear in the file 

3 . section seel from file file3.o, and then section sec2 from file fileS.o 

If there are any additional input files that contain input sections also named 
outsecl^ these sections are linked following the last section named in the 
definition of outsecl. If there are any other input sections in filel.o or 
fileS.O/ they will be placed in output sections with the same names as the 
input sections unless they are included in other file specifications. 

The code 

*(secname) 

may be used to indicate all previously unallocated input sections of the 
given name, regardless of what input file they are contained in. 

Load a Section at a Specified Address 

Bonding of an output section to a specific virtual address is accom- 
plished by an ld(l) option as shown in the following SECTIONS directive 
example: 




SE3CTI0NS 
{ 

outsec addr: 
{ 



} 

etc. 

} 




The addr is the bonding address expressed as a C constant. If outsec does 
not fit at addr (perhaps because of holes in the memory configuration or 
because outsec is too large to fit without overlapping some other output sec- 
tion), ld(l) issues an appropriate error message, addr may also be the word 
BIND, followed by a parenthesized expression. The expression may use the 
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pseudo-functions SIZEOF, ADDR or NEXT. NEXT accepts a constant and 
returns the first multiple of that value that falls into configured unallocated 
memory; SIZEOF and ADDR accept previously defined sections. 

As long as output sections do not overlap and there is enough space, 
they can be bound anywhere in configured memory. The SECTIONS direc- 
tives defining output sections need not be given to ld(l) in any particular 
order, unless SIZEOF or ADDR is used. 

ld(l) does not ensure that each section's size consists of an even number 
of bytes or that each section starts on an even byte boundary. The assem- 
bler ensures that the size (in bytes) of a section is evenly divisible by 4. 
ld(l) directives can be used to force a section to start on an odd byte boun- 
dary although this is not recommended. If a section starts on an odd byte 
boundary, the section's contents are either accessed incorrectly or are not 
executed properly. When a user specifies an odd byte boundary, ld(l) issues 
a warning message. 

Aligning an Output Section 

It is possible to request that an output section be bound to a virtual 
address that falls on an w-byte boundary, where n is a power of 2. The 
ALIGN option of the SECTIONS directive performs this function, so that 
the option 

ALIGN(n) 

is equivalent to specifying a bonding address of 

( , + n - 1) S.'^(n - 1) 
For example. 
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{ 

outsec ALIGN( 0x20000) : 
{ 



} 

etc. 

} 




The output section outsec is not bound to any given address but is placed at 
some virtual address that is a multiple of 0x20000 (e.g., at address 0x0, 
0x20000, 0x40000, 0x60000, etc.). 

Grouping Sections Together 

The default allocation algorithm for ld(l) 

1 . Links all input unit sections together, followed by .text sections, 
into one output section. This output section is called .text and is 
bound to an address of 0x0 plus the size of all headers in the output 
file. 

2. Links all input .data sections together into one output section. This 
output section is called .data and, in paging systems, is bound to an 
address aligned to a machine dependent constant plus a number 
dependent on the size of headers and text. 

3. Links all input .bss sections together with all uninitialized, unallo- 
cated global symbols, into one output section. This output section is 
called .bss and is allocated so as to immediately follow the output 
section .data. Note that the output section .bss is not given any par- 
ticular address alignment. 

Specifying any SECTIONS directives results in this default allocation 
not being performed. Rather than relying on the ld(l) default algorithm, if 
you are manipulating COFF files, the one certain way of determining 
address and order information is to take it from the file and section headers. 
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The default allocation of ld(l) is equivalent to supplying the following 
directive: 




SECTIONS 
{ 



.text sizeof_headers : { *(.iiiit) ♦(.text) } 
G90UP BIND( NEXT(align_value) + 

({SIZBGF(.ta3ct} •^ AIXSK .text) } % 0x2000)) : 

{ 

.data : { } 
.bss : { } 



} 

} 




where align j)alue is a machine dependent constant. The GROUP command 
ensures that the two output sections, .data and .bss, are allocated (e.g., 
grouped) together. Bonding or alignment information is supplied only for 
the group and not for the output sections contained within the group. The 
sections making up the group are allocated in the order listed in the direc- 
tive. 

If .text, .data, and .bss are to be placed in the same segment, the follow- 
ing SECTIONS directive is used: 
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C3RCIUP : 
{ 

.text : { } 

.data : { } 

.bss : { } 



} 

} 




Note that there are still three output sections (.text, .data, and .bss), but 
now they are allocated into consecutive virtual memory. 

This entire group of output sections could be bound to a starting 
address or aligned simply by adding a field to the GROUP directive. To 
bind to OxCOOOO, use 

GROUP QxCOOOO : { 

To align to 0x10000, use 

GROUP ALIGN( 0x10000) : { 

With this addition, first the output section .text is bound at OxCOOOO (or is 
aligned to 0x10000); then the remaining members of the group are allo- 
cated in order of their appearance into the next available memory locations. 

When the GROUP directive is not used, each output section is treated as 
an independent entity: 
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SBcncxc 



.text : { } 

.data MJOK 0x20000} : { } 
.bss : { } 



} 





The .text section starts at virtual address 0x0 (if it is in configured memory) 
and the .data section at a virtual address aligned to 0x20000. The .bss sec- 
tion follows immediately after the .text section if there is enough space. If 
there is not, it follows the .data section. The order in which output sections 
are defined to ld(l) cannot be used to force a certain allocation order in the 
output file. 

Creating Holes Within Output Sections 

The special symbol dot, appears only within section definitions and 
assignment statements. When it appears on the left side of an assignment 
statement, . causes ld(l)'s location counter to be incremented or reset and a 
hole left in the output section. Holes built into output sections in this 
manner take up physical space in the output file and are initialized using a 
fill character (either the default fill character (0x00) or a supplied fill charac- 
ter). See the definition of the -f option in "Using the Link Editor" and the 
discussion of filling holes in "Initialized Section Holes" or .bss Sections." in 
this chapter. 

Consider the following section definition: 
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r 



Gutsec: 




. += 0x1000; 
f1.o (.text) 
. += 0x100; 
f2.o (.text) 
. = align (4); 
f3.o (.text) 



The effect of this command is as follows: 

1 . A 0x1000 byte hole, filled with the default fill character, is left at the 
beginning of the section. Input section fl.o (.text) is linked after 
this hole. 

2. The .text section of input file f2.o begins at 0x100 bytes following 
the end of fl.o (.text). 

3. The .text section of f3.o is linked to start at the next full word boun- 
dary following the .text section of f 2.o with respect to the begin- 
ning of outsec. 

For the purposes of allocating and aligning addresses within an output 
section, ld{l) treats the output section as if it began at address zero. As a 
result, if, in the above example, outsec ultimately is linked to start at an odd 
address, then the part of outsec built from f3.o (.text) also starts at an odd 
address— even though f3.o (.text) is aligned to a full word boundary. This 
is prevented by specifying an alignment factor for the entire output section. 

outsec ALIGN(4) : { 

It should be noted that the assembler, as, always pads the sections it 
generates to a full word length making explicit alignment specifications 
unnecessary. This also holds true for the compiler. 
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Expressions that decrement . are illegal. For example, subtracting a 
value from the location counter is not allowed since overwrites are not 
allowed. The most common operators in expressions that assign a value to . 
are += and align. 

Creating and Defining Symbols at Linlc-Edit Time 

The assignment instruction of ld(l) can be used to give symbols a value 
that is link-edit dependent. Typically, there are three types of assignments: 

1. Use of . to adjust ld(l)'s location counter during allocation. 

2. Use of . to assign an allocation-dependent value to a symbol. 

3. Assigning an allocation-independent value to a symbol. 

Case 1) has already been discussed in the previous section. 

Case 2) provides a means to assign addresses (known only after allocation) 
to symbols. For example. 



SBCTICNS 

{ 

outscl: {...) 

Gutsc2: 

{ 



filel.o (s1) 
s2_start = , ; 
£iie2.o (s2) 
s2 end = . - 1; 



The symbol s2__start is defined to be the address of £ile2.o(s2), and s2__end is 
the address of the last byte of file2.o(s2). 
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Consider the following example: 




{ 

outsd: 
{ 

filel.o ( .data) 
nork = . ; 
. += 4; 

file2.o ( .data) 



} 

} 




In this example, the symbol mark is created and is equal to the address 
of the first byte beyond the end of filel.o's .data section. Four bytes are 
reserved for a future run-time initialization of the symbol mark. The type 
of the symbol is a long integer (32 bits). 

Assignment instructions involving . must appear within SECTIONS 
definitions since they are evaluated during allocation. Assignment instruc- 
tions that do not involve . can appear within SECTIONS definitions but 
typically do n«)t. Such instructions are evaluated after allocation is com- 
plete. Reassignment of a defined symbol to a different address is dangerous. 
For example, if a symbol within .data is defined, initialized, and referenced 
within a set of object files being link-edited, the symbol table entry for that 
symbol is changed to reflect the new, reassigned physical address. How- 
ever, the associated initialized data is not moved to the new address, and 
there may be references to the old address. The ld(l) issues warning mes- 
sages for each defined symbol that is being redefined within an ifile. How- 
ever, assignments of absolute values to new symbols are safe because there 
are no references or initialized data associated with the symbol. 
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Allocating a Section Into Named Memory 

It is possible to specify that a section be linked (somewhere) within a 
specific named memory (as previously specified on a MEMORY directive). 
(The > notation is borrowed from the UNIX system concept of redirected 
output.) For example. 




{ 



roeml: o=OxOOOOOO 1=0x10000 

TDBoa im): 0=0x020000 1=0x40000 

nenS (BW): o=0x070000 1=0x40000 

meml : o=0x120000 1=0x04000 

} 

SBCTTICNS 
{ 



Gutsecl: { fl.o(.clata) } > meml 
oatsec2: { f2.o( .data) } > mem3 

} 




This directs ld(l) to place outsecl anywhere within the memory area named 
meml (i.e., somewhere within the address range OxO-OxFFFF or 0x120000- 
0xl23FFF). The outsec2 is to be placed somewhere in the address range 
Ox70000-OxAFFFF. 

Initialized Section Holes or .bss Sections 

When holes are created within a section (as in the example in "Creating 
Holes within Output Sections"), ld(l) normally puts out bytes of zero as fill. 
By default, .bss sections are not initialized at all; that is, no initialized data 
is generated for any .bss section by the assembler nor supplied by the link 
editor, not even zeros. 
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Initialization options can be used in a SECTIONS directive to set such 
holes or output .bss sections to an arbitrary 2-byte pattern. Such initializa- 
tion options apply only to .bss sections or holes. As an example, an applica- 
tion might want an uninitialized data table to be initialized to a constant 
value without recompiling the file or a hole in the text area to be filled 
with a transfer to an error routine. 

Either specific areas within an output section or the entire output sec- 
tion may be specified as being initialized. However, since no text is gen- 
erated for an uninitialized .bss section, if part of such a section is initial- 
ized, then the entire section is initialized. In other words, if a .bss section 
is to be combined with a .text or .data section (both of which are initialized) 
or if part of an output .bss section is to be initialized, then one of the fol- 
lowing will hold: 

1 . Explicit initialization options must be used to initialize all .bss sec- 
tions in the output section. 

2. ld(l) will use the default fill value to initialize all .bss sections in 
the output section. 

Consider the following ld(l) ifile: 
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{ 

sec1: 
{ 

fl.o 

. =+ 0x200; 

f2.o (.text) 
} = QxDC!FF 
sec2; 
{ 

f1.o (.bss) 

f2.o (.bss) = 0x1234 

} 

sec3: 
{ 

fS.o (.bss) 



} = OxFFPF 

sec4: { f4.o (,bss) } 

} 




In the example above, the 0x200 byte hole in section seel is filled with 
the value OxDFFF. In section sec2, f l.o(.bss) is initialized to the default fill 
value of 0x00, and f2.o(.bss) is initialized to 0x1234. All .bss sections within 
sec3 as well as all holes are initialized to OxFFFF. Section sec4 is not initial- 
ized; that is, no data is written to the object file for this section. 
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Changing the Entry Point 

The UNIX system a.out optional header contains a field for the (pri- 
mary) entry point of the file. This field is set using one of the following 
rules (listed in the order they are applied): 

The value of the symbol specified 

with the -e option, if present, is used. 

1 . The value of the symbol _start, if present, is used. 

2. The value of the symbol main, if present, is used. 

3. The value zero is used. 

Thus, an explicit entry point can be assigned to this a.out header field 
through the — e option or by using an assignment instruction in an ifile of 
the form 

start = expression; 

If ld(l) is called through cc(l), a startup routine is automatically linked 
in. Then, when the program is executed, the routine exit(l) is called after 
the main routine finishes to close file descriptors and do other cleanup. The 
user must therefore be careful when calling ld(l) directly or when changing 
the entry point. The user must supply the startup routine or make sure that 
the program always calls exit rather than falling through the end. Other- 
wise, the program will dump core. 

Use of Archive Libraries 

Each member of an archive library (e.g., libera) is a complete object file. 
Archive libraries are created with the ar(l) command from object files gen- 
erated by cc or as. An archive library is always processed using selective 
inclusion: only those members that resolve existing undefined-symbol 
references are taken from the library for link editing. Libraries can be 
placed both inside and outside section definitions. In both cases, a member 
of a library is included for linking whenever: 
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1 . There exists a reference to a symbol defined in that member. 

2. The reference is found by ld(l) prior to the actual scanning of the 
library. 

When a library member is included by searching the library inside a 
SECTIONS directive, all input sections from the library member are 
included in the output section being defined. When a library member is 
included by searching the library outside of a SECTIONS directive, all input 
sections from the library member are included into the output section with 
the same name. If necessary, new output sections are defined to provide a 
place to put the input sections. Note, however, that 

1 . Specific members of a library cannot be referenced explicitly in an 
ifile. 

2. The default rules for the placement of members and sections cannot 
be overridden when they apply to archive library members. 

The —1 option is a shorthand notation for specifying an input file com- 
ing from a predefined set of directories and having a predefined name. By 
convention, such files are archive libraries. However, they need not be so. 
Furthermore, archive libraries can be specified without using the -1 option 
by simply giving the (full or relative) UNIX system file path. 

The ordering of archive libraries is important since for a member to be 
extracted from the library it must satisfy a reference that is known to be 
unresolved at the time the library is searched. Archive libraries can be 
specified more than once. They are searched every time they are encoun- 
tered. Archive files have a symbol table at the beginning of the archive. 
ld(l) will cycle through this symbol table until it has determined that it 
cannot resolve any more references from that library. 

Consider the following example: 

1 . The input files filel.o and file2.o each contain a reference to the 
external function FCN. 

2. Input filel.o contains a reference to symbol ABC. 

3. Input file2.o contains a reference to symbol XYZ. 

4. Library liba.a, member 0, contains a definition of XYZ. 
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5. Library libc.a, member 0, contains a definition of ABC. 

6. Both libraries have a member 1 that defines FCN. 
If the Id(l) command were entered as 

Id filel.o -la file2.o -Ic 

then the FCN references are satisfied by liba.a, member 1, ABC is obtained 
from libc.a, member 0, and XYZ remains undefined (because the library 
liba.a is searched before file2.o is specified). If the ld(l) command were 
entered as 

Id filel.o file2.o -la -Ic 

then the FCN references are satisfied by liba.a, member 1, ABC is obtained 
from libca, member 0, and XYZ is obtained from liba.a, member 0. If the 
ld(l) command were entered as 

Id filel.o file2.o -Ic -la 

then the FCN references are satisfied by libc.a, member 1, ABC is obtained 
from libca, member 0, and XYZ is obtained from liba.a, member 0. 

The -u option is used to force the Unking of library members when the 
link edit run does not contain an actual external reference to the members. 
For example. 

Id — u routl —la 

creates an undefined symbol called routl in ld(l)'s global symbol table. If 
any member of library liba.a defines this symbol, it (and perhaps other 
members as well) is extracted. Without the -u option, there would have 
been no unresolved references or undefined symbols to cause ld(l) to search 
the archive library. 



Dealing With Holes in Physical Memory 

When memory configurations are defined such that unconfigured areas 
exist in the virtual memory, each application or user must assume the 
responsibility of forming output sections that will fit into memory. For 
example, assume that memory is configured as follows: 
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{ 

ineml: o = QxOOOOO 1 = 0x02000 

ineni2: o = 0x40000 1 =: 0x05000 

inem3: o = 0x20000 1 = 0x10000 



} 




Let the files fl.o, f2.o, . . . f n.o each contain three sections .text/ .data, 
and .bsS/ and suppose the combined .text section is 0x12000 bytes. There is 
no configured area of memory in which this section can be placed. 
Appropriate directives must be supplied to break up the .text output section 
so ld(l) may do allocation. For example. 




txtl: 
{ 

fl.o (.text) 
f2.o (.text) 
f3.o (.text) 

} 

txt2: 
{ 

f4.o ( .text) 
fS.o ( .text) 
f6.o ( .text) 



} 

etc. 
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Allocation Algorithm 

An output section is formed either as a result of a SECTIONS directive, 
by combining input sections of the same name, or by combining .text and 
.init into .text. An output section can have zero or more input sections 
comprising it. After the composition of an output section is determined, it 
must then be allocated into configured virtual memory. ld(l) uses an algo- 
rithm that attempts to minimize fragmentation of memory, and hence 
increases the possibility that a link edit run will be able to allocate all out- 
put sections within the specified virtual memory configuration. The algo- 
rithm proceeds as follows: 

1 . Any output sections for which explicit bonding addresses were 
specified are allocated. 

2. Any output sections to be included in a specific named memory are 
allocated. In both this and the succeeding step, each output section 
is placed into the first available space within the (named) memory 
with any alignment taken into consideration. 

3. Output sections not handled by one of the above steps are allocated. 

If all memory is contiguous and configured (the default case), and no 
SECTIONS directives are given, then output sections are allocated in the 
order they appear to ld(l). Otherwise, output sections are allocated in the 
order they were defined or made known to ld(l) into the first available 
space they fit. 



Incremental Link Editing 

As previously mentioned, the output of ld(l) can be used as an input 
file to subsequent ld(l) runs providing that the relocation information is 
retained (— r option). Large applications may find it desirable to partition 
their C programs into subsystems, link each subsystem independently, and 
then link edit the entire application. For example. 
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Step 1: 

Id — r — o outfilel ifilel infilel.o 




SBCnCKS 

{ 

ssl: 
{ 

fl.o 
f2.o 

fn.o 



} 




Step 2: 

Id -r — o outfile2 ifile2 infile2.o 




SKCi'iCNS 
{ 

ss2: 
{ 

g1.o 
g2.o 



gn.o 

} 

} 
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Step 3: 



Id -a — o final.out outfilel outfile2 

By judiciously forming subsystems, applications may achieve a form of 
incremental link editing whereby it is necessary to relink only a portion of 
the total link edit when a few files are recompiled. 

To apply this technique, there are two simple rules 

1 . Intermediate link edits should contain only SECTIONS declarations 
and be concerned only with the formation of output sections from 
input files and input sections. No binding of output sections should 
be done in these runs. 

2. All allocation and memory directives, as well as any assignment 
statements, are included only in the final ld(l) call. 



Sections may be given a type in a section definition as shown in the fol- 
lowing example: 



DSECT, COPY, NOLOAD, INFO, and OVERLAY 
Sections 





SECnCNS 



namel 0x200000 (DSBC^T) 
nanie2 0x400000 (OOFY) 



{ filel.o } 

{ file2.o } 

{ files, o } 

{ file4.o } 

{ files. o } 



names 0x600000 (NDIOAD) 
name4 (IMED) 
names 0x900000 (OVERLAP) 
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The DSECT option creates what is called a dummy section. A dummy 
section has the following properties: 

1 . It does not participate in the memory allocation for output sections. 
As a result, it takes up no memory and does not show up in the 
memory map generated by ld(l). 

2. It may overlay other output sections and even unconfigured 
memory. DSECTs may overlay other DSECTs. 

3. The global symbols defined within the dummy section are relocated 
normally. That is, they appear in the output file's symbol table with 
the same value they would have had if the DSECT were actually 
loaded at its virtual address. DSECT-defined symbols may be refer- 
enced by other input sections. Undefined external symbols fouiid 
within a DSECT cause specified archive libraries to be searched and 
any members which define such symbols are link edited normally 
(i.e., not as a DSECT), 

4. None of the section contents, relocation information, or line number 
information associated with the section is written to the output file. 

In the above example, none of the sections from filel.o are allocated, but all 
symbols are relocated as though the sections were link edited at the 
specified address. Other sections could refer to any of the global symbols 
and they are resolved correctly. 

A copy section created by the COPY option is similar to a dummy sec- 
tion. The only difference between a copy section and a dummy section is 
that the contents of a copy section and all associated information is written 
to the output file. 

An INFO section is the same as a COPY section but its purpose is to 
carry information about the object file whereas the COPY section may con- 
tain valid text and data. INFO sections are usually used to contain file ver- 
sion identification information. 

A section with the type of NOLOAD differs in only one respect from a 
normal output section; its text and /or data is not written to the output file. 
A NOLOAD section is allocated virtual space, appears in the memory map, 
etc. 
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An OVERLAY section is relocated and written to the output file. It is 
different from a normal section in that it is not allocated and may overlay 
other sections or unconfigured memory. 

Output File Blocking 

The BLOCK option (applied to any output section or GROUP directive) 
is used to direct ld(l) to align a section at a specified byte offset in the out- 
put file. It has no effect on the address at which the section is allocated nor 
on any part of the link edit process. It is used purely to adjust the physical 
position of the section in the output file. 




.text BLOCK( 0x200) : { } 

.data AL1GN( 0x20000) EIX)QC(Ox200) : { } 

} 




With this SECTIONS directive, ld(l) assures that each section, .text and 
•data, is physically written at a file offset, which is a multiple of 0x200 (e.g., 
at an offset of 0, 0x200, 0x400, and so forth, in the file). 



Nonrelocatable Input Files 

If a file produced by ld(l) is intended to be used in a subsequent Id(l) 
run, the first ld(l) run should have the — r option set. This preserves reloca- 
tion information and permits the sections of the file to be relocated by the 
subsequent run. 

If an input file to ld(l) does not have relocation or symbol table infor- 
mation (perhaps from the action of a strip(l) command, or from being link 
edited without a — r option or with a — s option), the link edit run continues 
using the nonrelocatable input file. 
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For such a link edit to be successful (i.e., to actually and correctly link 
edit all input files, relocate all symbols, resolve unresolved references, etc.), 
two conditions on the nonrelocatable input files must be met. 

1 . Each input file must have no unresolved external references. 

2. Each input file must be bound to the exact same virtual address as it 
was bound to in the ld(l) run that created it. 



If these two conditions are not met for all nonrelocatable input files, no 
NOTE error messages are issued. Because of this fact, extreme care must be 
I taken when supplying such input files to ld(l). 
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Directives 



Expanded Directives 



<ifile> 
<cmd> 



< memory > 

< memory spec > 

< attributes > 
<origin_spec> 
<lenth_spec> 

< origin > 

< length > 



{<cmd>} 

< memory > 

< sections > 

< assignment > 

< filename > 
<flags> 

MEMORY { <memory_spec> 
I [ ' ] <memory_spec> ] } 
<name> [ < attributes > ] : 
<origin_spec> [ , ] <length_spec> 
( {R|W|X|I) ) 

< origin > = <long> 

< length > = <long> 
ORIGIN I o I org I origin 
LENGTH |1 1 len I length 



Figure 12-2: Syntax Diagram for Input Directives 



Two punctuation symbols, square brackets and curly braces, do double 
NOTE duty in this diagram. 

I Where the actual symbols, [] and {) are used, they are part of the syn- 
tax and must be present when the directive is specified. 

Where you see the symbols [ and ] (larger and in bold), it means the 
material enclosed is optional. 

Where you see the symbols {and } (larger and in bold), it means mul- 
tiple occurrences of the material enclosed are permitted. 
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Directives 



Expanded Directives 



< sections > 

< sec_or_group > 

< group > 

<section_list> 

< section > 



< group_options> 

<secj>ptions> 

<addr> 
<alignoption> 

< align > 
<block_option> 

< block > 
<type_option> 

<fill> 

<mem_spec> 

< statement > 



SECTIONS { {<sec_^or_group>} } 

< section > | < group > | < library > 
GROUP < group options > : { 

< section Jist> }'[<mem_spec>] 
< section > {[,] < section > } 

<name> <sec_options> : 
{ <stateinent> } 
[<fill>] [<mem_spec>] 

[<addr>] |[<align_option>] [<block_option>] 

[<addr>]|[<align option >] 
<block_option>J f<type_option>] 
<long> I <bind>( <expr> ) 

< align > ( <expr> ) 
ALIGN I align 
<block> ( <long> ) 
BLOCK I block 

(DSECT) I (NOLOAD) | (COPY) 
I (INFO) I (OVERLAY) 
= <long> 

> <name> 

> < attributes > 

< filename > 

< filename > ( <name_list> )|[CC^^MON] 
• ( <name_list> ) | [COMMON] 

< assignment > 

< library > 
null 



Figure 12-2: Syntax Diagram for Input Directives (continued) 
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Directives 


Expanded Directives 


<name list> 


< section naine> [/] { <section_name> } 


< librarv > 


— l<name> 


<bind> 


BIND Ibind 




<lside> < assign op> <expr> <end> 


<lside> 


<name> ] . 




1 ' 1 1 1 ' 


<end> 






<exDr> ^binarv OD>" <exor> 




<term> 


< binary _op> 


•1/1% 




+ 1- 




>> 1 << 




==|!=l> l< !<=!>= 




& 




1 

&& 

11 


< teriTi > 


11 

<long> 




-< name >■ 




< align > ( <term> ) 




( <expr> ) 




< unary op > <term> 




<phy> (<lside>) 




< sizeof > ( < sectionname > ) 




<next>(<long>) 




< addr > ( < sectio nname > ) 


< unary _op> 




<phy> 


PHYlphy 


<sizeof> 


SIZEOF 1 sizeof 



Figure 12-2: Syntax Diagram for Input Directives (continued) 
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Directives 


Expanded Directives 


<next> 


NEXT 1 next 


<addr> 


ADDR 1 addr 


<flags> 


— e < wht space > < name > 




— f < whtspace > < long > 




— h<wht_space> <long> 




— l<name> 




— m 




— o<whtj5pace> < filename > 
— r 
— s 




-t 

— u<wht space >< name > 




—2 

-H 




— L<path name> 




-M 




-N 




-s 




-V 




— VS < whtjspace > < long > 
—a 


<name> 


—X 

Any valid symbol name 


<lone> 

o 


Any valid long integer constant 


<wht_space> 


Blanks^ tabs, and newlines 


< filename > 


Any valid UNIX operating system 




filename. This may include a 




full or partial path name. 


<sectionname> 


Any valid section name. 




up to 8 characters 


<path_name> 


Any valid UNIX operating system 




path name (full or partial) 



Figure 12-2: Syntax Diagram for Input Directives (continued) 
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Introduction 



The trend toward increased modularity of programs means that a project 
may have to cope with a large assortment of individual files. There may 
also be a wide range of generation procedures needed to turn the assort- 
ment of individual files into the final executable product. 

make(l) provides a method for maintaining up-to-date versions of pro- 
grams that consist of a number of files that may be generated in a variety of 
ways. 

An individual programmer can easily forget 

■ file-to-file dependencies 

■ files that were modified and the impact that has on other files 

■ the exact sequence of operations needed to generate a new version of 
the program 

In a description file, make keeps track of the commands that create files 
and the relationship between files. Whenever a change is made in any of 
the files that make up a program, the make command creates the finished 
program by recompiling only those portions directly or indirectly affected 
by the change. 

The basic operation of make is to 

■ find the target in the description file 

■ ensure that all the files on which the target depends, the files needed 
to generate the target, exist and are up to date 

■ create the target file if any of the generators have been modified 
more recently than the target 

The description file that holds the information on interfile dependencies 
and command sequences is conventionally called makefile. Makefile, or 
s{mM]akefile. If this naming convention is followed, the simple command 
make is usually sufficient to regenerate the target regardless of the number 
files edited since the last make. In most cases, the description file is not 
difficult to write and changes infrequently. Even if only a single file has 
been edited, rather than typing all the commands to regenerate the target, 
typing the make command ensures the regeneration is done in the 
prescribed way. 
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The basic operation of make is to update a target file by ensuring that 
all of the files on which the target file depends exist and are up to date. 
The target file is regenerated if it has not been modified since the depen- 
dents were modified. The make program searches the graph of dependen- 
cies. The operation of make depends on its ability to find the date and time 
that a file was last modified. 

The make program operates using three sources of information: 

■ a user-supplied description file 

■ filenames and last-modified times from the file system 

■ built-in rules to bridge some of the gaps 

To illustrate, consider a simple example in which a program named 
prog is made by compiling and loading three C language files x.c, y,c, and 
z.c with the math library. By convention, the output of the C language 
compilations will be found in files named x.o, y.o, and z.o. Assume that the 
files x.c and y.c share some declarations in a file named defs.h, but that z.c 
does not. That is, x.c and y.c have the line 

#iiiclude "defs.h" 

The following specification describes the relationships and operations: 

pirog : x.c y.o z.o 

cc x.c y.o z.o — Im -o prog 

x.c y.o : defs.h 
If this information were stored in a file named makefile, the command 
make 

would perform the operations needed to regenerate prog after any changes 
had been made to any of the four source files x.c, y.c, z.c, or defs.h. In the 
example above, the first line states that prog depends on three .o files. 
Once these object files are current, the second line describes how to load 
them to create prog. The third line states that x.o and y.o depend on the 
file defs.h. From the file system, make discovers that there are three .c files 
corresponding to the needed .o files and uses built-in rules on how to gen- 
erate an object from a C source file (i.e., issue a cc — c command). 
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If make did not have the ability to determine automatically what needs 
to be done, the following longer description file would be necessary: 



parog : 


x.o 


y.o z.o 




cc 


x.o y.o z.o -Im -o prog 


x.o : 


x.c 


defs.h 




cc 


-c x.c 


y.o : 


y.c 


defs.h 




cc 


-c y.c 


z.o : 


z.c 




V 


cc 


-c z.c 



If none of the source or object files have changed since the last time 
prog was made, and all of the files are current, the command make 
announces this fact and stops. If, however, the defs.h file has been edited, 
x,c and y.c (but not z.c) are recompiled; and then prog is created from the 
new x.o and y.o files, and the existing z.o file. If only the file y.c had 
changed, only it is recompiled; but it is still necessary to reload prog. If no 
target name is given on the make command line, the first target mentioned 
in the description is created; otherwise, the specified targets are made. The 
command 

make x.o 

would regenerate x.o if x.c or defs.h had changed. 

A method often useful to programmers is to include rules with 
mnemonic names and commands that do not actually produce a file with 
that name. These entries can take advantage of make's ability to generate 
files and substitute macros (for information about macros, see "Description 
Files and Substitutions" below.) Thus, an entry "save" might be included to 
copy a certain set of files, or an entry "clean" might be used to throw away 
unneeded intermediate files. 
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If a file exists after such commands are executed, the file's time of last 
modification is used in further decisions. If the file does not exist after the 
commands are executed, the current time is used in making further deci- 
sions. 

You can maintain a zero-length file purely to keep track of the time at 
which certain actions were performed. This technique is useful for main- 
taining remote archives and listings. 

A simple macro mechanism for substitution in dependency lines and 
command strings is used by make. Macros can either be defined by 
command-line arguments or included in the description file. In either case, 
a macro consists of a name followed by an equals sign followed by what the 
macro stands for. A macro is invoked by preceding the name by a dollar 
sign. Macro names longer than one character must be parenthesized. The 
following are valid macro invocations: 




$2 

$(xy) 



$Z 
$(Z) 




The last two are equivalent. 

$*/ $@/ $?/ and $< are four special macros that change values during 
the execution of the command. (These four macros are described later in 
this chapter under "Description Files and Substitutions.") The following 
fragment shows assignment and use of some macros: 
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OBJECTS =: x.o y.o z.o 
TiTBFS = —2x0. 
prog: ${QBJBCrS) 




cc $(OBJBCTS) $(LIEES) -o prog 




The command 

make LIBES="-11 -Im" 

loads the three objects with both the lex (—11) and the math (— Im) libraries, 
because macro definitions on the command line override definitions in the 
description file. (In UNIX system commands, arguments with embedded 
blanks must be quoted.) 

As an example of the use of make, a description file that might be used 
to maintain the make command itself is given. The code for make is spread 
over a number of C language source files and has a yacc grammar. The 
description file contains the following: 



make 
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# Description file for the noke oonnend 

FILES = I^Jcefile defs.h main.c doname.c niisc.c 

files. c dosfys.c gram.y 
OBJECTS = main.o daname.o ndsc.o files. o 

dosys.o cpram.o 
LIBES= —lid 
LIMP « lint -p 
CELAGS = -O 
LP = /usr/bin/lp 

make: $(OBJEdS) 

$(0C) $(CFL«3S) ${CBJBCrS) $(LIBES) -o maloe 
@size wake 

$(OBJBCTS): defs.h 

cleanup: 

-xm *.o gram.c 
-du 

install: 

@size wake /usr/bin/ceke 

cp malce /usr/bin/make &&. m itake 

lint : dosys.c doname.c files. c nain.c misc.c gram.c 

$(LINr) dosys.c dooiaine.c files. c nain.c misc.c \ 
gram.c 



# print files that are oat-of-date 

# with respect to "print" file. 



print: $(FrLES) 

pr $7 I $(LP) 
touch print 



The make program prints out each command before issuing it. 
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The following output results from typing the command make in a direc- 
tory containing only the source and description files: 




cc -o — c doname.c 
cc -o -c misc.c 
cc -o ^ files. c 
cc -o -c dosys.c 
yacc gram.y 

y.tab.c gram.c 
cc -o -c gram.c 



cc main.o doname.o misc.o files. o dosys.o 

gram.o —lid -o msOce 
13188 + 3348 -t- 3044 =: 19580 




The string of digits results from the size make command. The printing of 
the command line itself was suppressed by an at sign, @, in the description 
file. 



make 13-7 



Description Files and Substitutions 

The following section will explain the customary elements of the 
description file. 



Comments 

The comment convention is that a sharp, #, and all characters on the 
same line after a sharp are ignored. Blank lines and lines beginning with a 
sharp are totally ignored. 



Continuation Lines 

If a noncomment line is too long, the line can be continued by using a 
backslash. If the last character of a line is a backslash, then the backslash, 
the new line, and all following blanks and tabs are replaced by a single 
blank. 



Macro Definitions 

A macro definition is an identifier followed by an equal sign. The 
identifier must not be preceded by a colon or a tab. The name (string of 
letters and digits) to the left of the equal sign (trailing blanks and tabs are 
stripped) is assigned the string of characters following the equal sign (lead- 
ing blanks and tabs are stripped). The following are valid macro 
definitions: 

2 = xyz 

abc = —11 — ly —Im 
UEES = 

The last definition assigns LIBES the null string. A macro that is never 
explicitly defined has the null string as its value. Remember, however, that 
some macros are explicitly defined in make's own rules. (See Figure 13-2 at 
the end of the chapter.) 
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General Form 

The general form of an entry in a description file is 

targetl [target2 ...] :[:] [depeiidenti .,.] [; ocraiands] [# ,..] 
[ \t cxmconds] [# ...] 

Items inside brackets may be omitted and targets and dependents are 
strings of letters, digits, periods, and slashes. Shell metacharacters such as ♦ 
and ? are expanded when the line is evaluated. Commands may appear 
either after a semicolon on a dependency line or on lines beginning with a 
tab immediately following a dependency line. A command is any string of 
characters not including a sharp, #, except when the sharp is in quotes. 



Dependency Information 

A dependency line may have either a single or a double colon. A target 
name may appear on more than one dependency line, but all of those lines 
must be of the same (single or double colon) type. For the more common 
single-colon case, a command sequence may be associated with at most one 
dependency line. If the target is out of date with any of the dependents on 
any of the lines and a command sequence is specified (even a null one fol- 
lowing a semicolon or tab), it is executed; otherwise, a default rule may be 
invoked. In the double-colon case, a command sequence may be associated 
with more than one dependency line. If the target is out of date with any 
of the files on a particular line, the associated commands are executed. A 
built-in rule may also be executed. The double colon form is particularly 
useful in updating archive-type files, where the target is the archive library 
itself. (An example is included in the "Archive Libraries" section later in 
this chapter.) 



Executable Commands 

If a target must be created, the sequence of commands is executed. Nor- 
mally, each command line is printed and then passed to a separate invoca- 
tion of the shell after substituting for macros. The printing is suppressed in 
the silent mode (~s option of the make command) or if the command line 
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in the description file begins with an @ sign, make normally stops if any 
command signals an error by returning a nonzero error code. Errors are 
ignored if the — i flag has been specified on the make command line, if the 
fake target name .IGNORE appears in the description file, or if the com- 
mand string in the description file begins with a hyphen. If a program is 
known to return a meaningless status, a hyphen in front of the command 
that invokes it is appropriate. Because each command line is passed to a 
separate invocation of the shell, care must be taken with certain commands 
(e.g., cd and shell control commands) that have meaning only within a sin- 
gle shell process. These results are forgotten before the next line is exe- 
cuted. 

Before issuing any command, certain internally maintained macros are 
set. The $@ macro is set to the full target name of the current target. The 
$@ macro is evaluated only for explicitly named dependencies. The $? 
macro is set to the string of names that were found to be younger than the 
target. The $? macro is evaluated when explicit rules from the makefile are 
evaluated. If the command was generated by an implicit rule, the $< macro 
is the name of the related file that caused the action; and the $♦ macro is the 
prefix shared by the current and the dependent filenames. If a file must be 
made but there are no explicit commands or relevant built-in rules, the 
commands associated with the name DEFAULT are used. If there is no such 
name, make prints a message and stops. 

In addition, a description file may also use the following related macros: 
$(@D), $(@F), $(*D), $(*F), $«D), and $«F) (see below). 



Extensions of $*, $@, and $< 

The internally generated macros $♦, $@, and $< are useful generic 
terms for current targets and out-of-date relatives. To this list has been 
added the following related macros: $((§D), $(@F), $(*D), $(*F), $(<D), and 
$(<F). The D refers to the directory part of the single character macro. 
The F refers to the filename part of the single character macro. These addi- 
tions are useful when building hierarchical makefiles. They allow access to 
directory names for purposes of using the cd command of the shell. Thus, a 
command can be 

cd $(<D); $(MAKE) $(<F) 
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Output Translations 

Macros in shell commands are translated when evaluated. The form is 
as follows: 

$ (macro : string1=strii3g2 ) 

The meaning of $(macro) is evaluated. For each appearance of stringl in 
the evaluated macro, string2 is substituted. The meaning of finding stringl 
in $(macro) is that the evaluated $(macro) is considered as a series of strings 
each delimited by white space (blanks or tabs). Thus, the occurrence of 
stringl in $(macro) means that a regular expression of the following form 
has been found: 

. *<string1> [TAB [BLANK] 

This particular form was chosen because make usually concerns itself 
with suffixes. The usefulness of this type of translation occurs when main- 
taining archive libraries. Now, all that is necessary is to accumulate the 
out-of-date members and write a shell script, which can handle all the C 
language programs (i.e., those files ending in .c). Thus, the following frag- 
ment optimizes the executions of make for maintaining an archive library: 

$(LIB): ${LIB)(a.o) $(LIB)(b.o) $(LrB)(c.o) 
$(CC) -c $(CFLAGS) $(?:.o=.c) 
$(AR) $(ARFLAGS) $(LIB) $? 
rm $? 

A dependency of the preceding form is necessary for each of the 
different types of source files (suffixes) that define the archive library. 
These translations are added in an effort to make more general use of the 
wealth of information that make generates. 
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Another feature of make concerns the environment and recursive invo- 
cations. If the sequence $(MAKE) appears anywhere in a shell command 
line, the line is executed even if the -n flag is set. Since the -n flag is 
exported across invocations of make (through the MAKEFLAGS variable), 
the only thing that is executed is the make command itself. This feature is 
useful when a hierarchy of makefile(s) describes a set of software subsys- 
tems. For testing purposes, make -n can be executed and everything 
that would have been done will be printed including output from lower 
level invocations of make. 



Suffixes and Transformation Rules 

make uses an internal table of rules to learn how to transform a file 
with one suffix into a file with another suffix. If the -r flag is used on the 
make command line, the internal table is not used. 

The list of suffixes is actually the dependency list for the name .SUF- 
FIXES, make searches for a file with any of the suffixes on the list. If it 
finds one, make transforms it into a file with another suffix. The transfor- 
mation rule names are the concatenation of the before and after suffixes. 
The name of the rule to transform a .r file to a ,o file is thus .no. If the rule 
is present and no explicit command sequence has been given in the user's 
description files, the command sequence for the rule .no is used. If a com- 
mand is generated by using one of these suffixing rules, the macro $* is 
given the value of the stem (everything but the suffix) of the name of the 
file to be made; and the macro $< is the full name of the dependent that 
caused the action. 

The order of the suffix list is significant since the list is scanned from 
left to right. The first name formed that has both a file and a rule associated 
with it is used. If new names are to be appended, the user can add an entry 
for .SUFFIXES in the description file. The dependents are added to the 
usual list. A .SUFFIXES line without any dependents deletes the current 
list. It is necessary to clear the current list if the order of names is to be 
changed. 
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Implicit Rules 

make uses a table of suffixes and a set of transformation rules to supply 
default dependency information and implied commands. The default suffix 
list is as follows: 





Object file 


.c 


C source file 




sees C source file 


.f 


FORTRAN source file 




sees FORTRAN source file 


•S 


Assembler source file 




sees Assembler source file 


•y 


yacc source grammar 


•y^ 


sees yacc source grammar 


.1 


lex source grammar 


.1- 


sees ex source grammar 


•h 


Header file 


.h- 


sees header file 


•sh 


Shell file 


.sh*' 


sees shell file 



Figure 13-1 summarizes the default transformation paths. If there are two 
paths connecting a pair of suffixes, the longer one is used only if the inter- 
mediate file exists or is named in the description. 
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y .1 

Figure 13-1: Summary of Default Transformation Path 



If the file x.o is needed and an x.c is found in the description or direc- 
tory, the x.o file would be compiled. If there is also an x.l, that source file 
would be run through lex before compiling the result. However, if there is 
no x.c but there is an x,l, make would discard the intermediate C language 
file and use the direct link as shown in Figure 13-1. 

It is possible to change the names of some of the compilers used in the 
default or the flag arguments with which they are invoked by knowing the 
macro names used. The compiler names are the macros AS, CC, F77, YACC, 
and LEX. The command 

make CC=newcc 

will cause the newcc command to be used instead of the usual C language 
compiler. The macros ASFLAGS, CFLAGS, F77FLAGS, YFLAGS, and 
LFLAGS may be set to cause these commands to be issued with optional 
flags. Thus 

make "CFLAGS — g" 

causes the cc command to include debugging information. 
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Archive Libraries 

The make program has an interface to archive libraries. A user may 
name a member of a library in the following manner: 

pro j lib ( CJb j ect . o ) 
or 

projlibC (entrypt) ) 

where the second method actually refers to an entry point of an object file 
within the library, (make looks through the library, locates the entry point, 
and translates it to the correct object filename.) 

To use this procedure to maintain an archive library, the following type 
of makefile is required: 

pETojlib:: projlib(pfilel.o) 

$(CC) -c -O pfilel.c 

$(AR) $(iURFLAGS) ptrojlib pfilel.o 

m p£lle1.o 
projlib:: projlib(pfile2.o) 

$(0C) -c -O p£ile2.c 

$(ABFLAGS) projlib pfile2.o 

m pfile2.o 

. . . and so on for each object . . . 

V 

This is tedious and error prone. Obviously, the command sequences for 
adding a C language file to a library are the same for each invocation; the 
filename being the only difference each time. (This is true in most cases.) 

The make command also gives the user access to a rule for building 
libraries. The handle for the rule is the .a suffix. Thus, a .c.a rule is the 
rule for compiling a C language source file, adding it to the library, and 
removing the .o cadaver. Similarly, the .y.a, the .s.a, and the J.a rules 
rebuild yacc, assembler, and lex files, respectively. The archive rules 
defined internally are x^a, .c^'.a, .f.a, •£"',a, and •s^'.a. (The tilde, syntax 
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will be described shortly.) The user may define other needed rules in the 
description file. 

The above two-member library is then maintained with the following 
shorter makefile: 

projlib: p(rojlib(pfile1,o) projlib(pfile2.o) 

@echo pQX>jlib v5)-tx)-date. 

The internal rules are already defined to complete the preceding library 
maintenance. The actual .ca rule is as follows: 

•ca: 

$(0C) -c $(CFLAGS) $< 
$(AR) $(ARELAGS) $@ $*.o 
rm -f $*»o 

Thus, the $@ macro is the .a target (projlib); the $< and $♦ macros are set 
to the out-of-date C language file; and the filename minus the suffix, respec- 
tively (pfilel.c and pfilel). The $< macro (in the preceding rule) could 
have been changed to $**c. 

It might be useful to go into some detail about exactly what make does 
when it sees the construction 

projlib: projlib (pfilel .o) 

@echo projlib up-to-date 

Assume the object in the library is out of date with respect to pfilel.c. Also, 
there is no pfilel.o file. 

1. make projlib. 

2. Before makeing projlib, check each dependent of projlib. 

3. projlib(pfilel.o) is a dependent of projlib and needs to be gen- 
erated. 

4. Before generating projlib(pfilel.o), check each dependent of 
projlib(pfilel.o). (There are none.) 

5. Use internal rules to try to create projlib(pfilel.o). (There is no 
explicit rule.) Note that projlib(pfilel.o) has a parenthesis in the 
name to identify the target suffix as .a. This is the key. There is no 
explicit .a at the end of the projlib library name. The parenthesis 
implies the .a suffix. In this sense, the .a is hard-wired into make. 



13-16 



PROGRAMMER'S GUIDE 



Recursive Makefiles 



6. Break the name projlib(pfilel.o) up into projlib and pfilel.o. 
Define two macros, $@ (=projlib) and $♦ (=pfilel). 

7. Look for a rule .X.a and a file $*.X. The first (in the .SUFFIXES 
list) which fulfills these conditions is ,c so the rule is .c.a, and the 
file is pfilel.c. Set $< to be pfilel.c and execute the rule. In fact, 
make must then compile pfilel.c. 

8. The library has been updated. Execute the command associated 
with the projlib: dependency; namely 

@echo projlib \:^to-date 

It should be noted that to let pfilel.c have dependencies, the following 
syntax is required: 

p(rojlib(pfile1.o) : $(INCDIR)/stdio.h pfilel.c 

There is also a macro for referencing the archive member name when this 
form is used. The $% macro is evaluated each time $@ is evaluated. If 
there is no current archive member, $% is null. If an archive member exists, 
then $% evaluates to the expression between the parenthesis. 
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The syntax of make does not directly permit referencing of prefixes. 
For most types of files on UNIX operating system machines, this is accept- 
able since nearly everyone uses a suffix to distinguish different types of 
files. The SCCS files are the exception. Here, s. precedes the filename part 
of the complete path name. 

To allow make easy access to the prefix s. the tilde, is used as an 
identifier of SCCS files. Hence, .c^.o refers to the rule which transforms an 
SCCS C language source file into an object file. Specifically, the internal 
rule is 

.c'".o: 

$(GBT) $(GFLAGS) $< 
$(CC) $(CFLAGS) -c $*.c 
-m -f $«.c 

Thus, the tilde appended to any suffix transforms the file search into an 
SCCS filename search with the actual suffix named by the dot and all char- 
acters up to (but not including) the tilde. 

The following SCCS suffixes are internally defined: 

.f- 
.y- 
.1- 

.sh"' 
.h- 

The following rules involving SCCS transformations are internally defined: 

♦c*.a: 

•c^'.o: 
.f-.a: 
.f-.f: 
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.s'^.o: 

J-.1: 
•h-.h: 

Obviously, the user can define other rules and suffixes, which may prove 
useful. The tilde provides a handle on the SCCS filename format so that 
this is possible. 



The Null Suffix 

There are many programs that consist of a single source file, make han- 
dles this case by the null suffix rule. Thus, to maintain the UNIX system 
program cat, a rule in the makefile of the following form is needed: 

.c: 

$(cx:) ${CFSjcs) $< -o $@ 

In fact, this .c: rule is internally defined so no makefile is necessary at 
all. The user only needs to type 

make cat dd echo date 

(these are all UNIX system single-file programs) and all four C language 
source files are passed through the above shell command line associated 
with the .c: rule. The internally defined single suffix rules are 

.c: 
.f: 
.sh: 

Others may be added in the makefile by the user. 
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include Files 

The make program has a capability similar to the #include directive of 
the C preprocessor. If the string include appears as the first seven letters of 
a line in a makefile and is followed by a blank or a tab, the rest of the line 
is assumed to be a filename, which the current invocation of make will 
read. Macros may be used in filenames. The file descriptors are stacked for 
reading include files so that no more than 16 levels of nested includes are 
supported. 

sees Makefiles 

Makefiles under SCCS control are accessible to make. That is, if make is 
typed and only a file named s^makefile or s.Makefile exists, make will do a 
get on the file, then read and remove the file. 



Dynamic Dependency Parameters 

The parameter has meaning only on the dependency line in a makefile. 
The $$@ refers to the current "thing" to the left of the colon (which is $@). 
Also the form $$(@F) exists, which allows access to the file part of $@. 
Thus, in the following: 

cat: $$@,c 

the dependency is translated at execution time to the string cat.c. This is 
useful for building a large number of executable files, each of which has 
only one source file. For instance, the UNIX software command directory 
could have a makefile like: 

CMDS = cat dd echo date anop oonoa chown 

$(CMDS): $$@.c 

$(0C) -0 $? -o $@ 
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Obviously, this is a subset of all the single file programs. For multiple 
file programs, a directory is usually allocated and a separate makefile is 
made. For any particular file that has a peculiar compilation procedure, a 
specific entry must be made in the makefile. 

The second useful form of the dependency parameter is $$(@F). It 
represents the filename part of $$@. Again, it is evaluated at execution 
time. Its usefulness becomes evident when trying to maintain the 
/usr/include directory from a makefile in the /usr/src/head directory. 
Thus, the /usr/src/head/makefile would look like 



mCDIR = /usr/incl\ide 

INCUUnES = \ 

$(INCDrR)/stdio.h \ 
$(INC3>IR)/pwd.h \ 
$(INCDIR)/dir.h \ 
$(INC3)IR}/a.out.h 

$(INCLUDES}: $$(@F) 
cp $? $@ 
chmod 0444 $@ 



This would completely maintain the /usr/include directory whenever 
one of the above files in /usr/src/head was updated. 
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The make command description is found under make(l) in the 
Programmer s Reference Manual. 



The make Command 

The make command takes macro definitions, options, description 
filenames, and target filenames as arguments in the form: 

make [ options ] [ macro definitions ] [ targets ] 

The following summary of command operations explains how these 
arguments are interpreted. 

First, all macro definition arguments (arguments with embedded equal 
signs) are analyzed and the assignments made. Command-line macros over- 
ride corresponding definitions found in the description files. Next, the 
option arguments are examined. The permissible options are as follows: 

— i Ignore error codes returned by invoked commands. This mode is 
entered if the fake target name .IGNORE appears in the description 
file. 

— s Silent mode. Do not print command lines before executing. This 
mode is also entered if the fake target name .SILENT appears in the 
description file. 

— r Do not use the built-in rules. 

— n No execute mode. Print commands, but do not execute them. Even 
lines beginning with an @ sign are printed. 

— t Touch the target files (causing them to be up to date) rather than 
issue the usual commands. 

— q Question. The make command returns a zero or nonzero status 
code depending on whether the target file is or is not up to date. 

— p Print out the complete set of macro definitions and target descrip- 
tions. 

— k Abandon work on the current entry if something goes wrong, but 
continue on other branches that do not depend on the current 
entry. 
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— e Environment variables override assignments within makefiles. 

— f Description filename. The next argument is assumed to be the name 
of a description file. A filename of — denotes the standard input. If 
there are no — f arguments, the file named makefile or Makefile or 
s^mMJakefile in the current directory is read. The contents of the 
description files override the built-in rules if they are present. 

The following two arguments are evaluated in the same manner as flags: 

.DEFAULT If a file must be made but there are no explicit com- 
mands or relevant built-in rules, the commands associ- 
ated with the name .DEFAULT are used if it exists. 

.PRECIOUS Dependents on this target are not removed when quit or 
interrupt is pressed. 

Finally, the remaining arguments are assumed to be the names of tar- 
gets to be made and the arguments are done in left-to-right order. If there 
are no such arguments, the first name in the description file that does not 
begin with a period is made. 



Environment Variables 

Environment variables are read and added to the macro definitions each 
time make executes. Precedence is a prime consideration in doing this 
properly. The following describes make's interaction with the environ- 
ment. A macro, MAKEFLAGS, is maintained by make. The macro is 
defined as the collection of all input flag arguments into a string (without 
minus signs). The macro is exported and thus accessible to further invoca- 
tions of make. Command line flags and assignments in the makefile update 
MAKEFLAGS. Thus, to describe how the environment interacts with make, 
the MAKEFLAGS macro (environment variable) must be considered. 

When executed, make assigns macro definitions in the following order: 

1. Read the MAKEFLAGS environment variable. If it is not present or 
null, the internal make variable MAKEFLAGS is set to the null 
string. Otherwise, each letter in MAKEFLAGS is assumed to be an 
input flag argument and is processed as such. (The only exceptions 
are the — f, — p, and — r flags.) 
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2. Read the internal list of macro definitions. 

3. Read the environment. The environment variables are treated as 
macro definitions and marked as exported (in the shell sense). 

4. Read the makefile(s). The assignments in the make£ile(s) overrides 
the environment- This order is chosen so that when a makefile is 
read and executed, you know what to expect. That is, you get what 
is seen unless the — e flag is used. The — e is the line flag, which 
tells make to have the environment override the makefile assign- 
ments. Thus, if make — e ... is typed, the variables in the environ- 
ment override the definitions in the makefile. Also MAKEFLAGS 
override the environment if assigned. This is useful for further 
invocations of make from the current makefile. 

It may be clearer to list the precedence of assignments. Thus, in order 
from least binding to most binding, the precedence of assignments is as fol- 
lows: 

1 . internal definitions 

2. environment 

3. makefile(s) 

4. command line 

The — e flag has the effect of rearranging the order to: 

1 . internal definitions 

2. makefile(s) 

3. environment 

4. command line 

This order is general enough to allow a programmer to define a makefile or 
set of makefiles whose parameters are dynamically definable. 
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Suggestions and Warnings 

The most common difficulties arise from make's specific meaning of 
dependency. If file x.c has a 

#include "defs.h" 

line, then the object file x.o depends on defs.h; the source file x.c does not. 
If defs.h is changed, nothing is done to the file x.c while file x.o must be 
recreated. 

To discover what make would do, the — n option is very useful. The 
command 

make — n 

orders make to print out the commands that make would issue without 
actually taking the time to execute them. If a change to a file is absolutely 
certain to be mild in character (e.g., adding a comment to an include file), 
the — t (touch) option can save a lot of time. Instead of issuing a large 
number of superfluous recompilations, make updates the modification times 
on the affected file. Thus, the command 

make — ts 

(touch silently) causes the relevant files to appear up to date. Obvious care 
is necessary because this mode of operation subverts the intention of make 
and destroys all memory of the previous relationships. 
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The standard set of internal rules used by make are reproduced below. 




# SUFFIXES REXXX3UZED BY MAKE 
# 



. SUFFIXES: .o .c .c" .y .y" .1 .1* .s .s* .h .h" .sh .sh* .f .f** 
# 

# PREDEFINED MACROS 
# 

MAKE:^:nake 
AR=ar 

AS=as 

ASFLAGS= 

CC=CC 

CFLAGS=-0 

F77=f77 

F77FLAGS= 

GErr=get 

GfTiAGS= 

LEX=lex. 

LFLAGS= 

LD=ld 

IDELAGS= 

YADC^yacx: 

YELAGS= 

# 

# SINGLE SUFFIX RULES 
# 

•c: 

$(0C) ${CFUCS) $(IDFLA5S) $< -o 
$(GEr) $(GFUCS) $< 

$(0C) ${CFUCS) $(LDfLAGS) $*.C -o $* 
-rm -f $*.c 

# f: 

$(B77) $(E77FLAGS) $(UDFLAGS) $< -o $@ 
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r 



continued 




${(3Er) $(gfia;s) $< 

$(F77) $(F77ELflGS) StlDFL^^) $< -o $♦ 
-rm -f $*.f 

.sh: 

cp $< $@; chnod 0777 $@ 

•sh": 

$(GET) $(GFLAGS) $< 

cp $#.sh $♦; chnod 0777 $@ 

-rm -f $».sh 

# 

# DOUBLE SUFFIX RULES 
# 

.c^.c .s^'.s .sh^.sh .y*'.y .1^.1 .h^.h: 

$(C^) $(GFLM3S) $< 



$(CX:) -C $(CELAGS) $< 
$(AR) $(ARFLAGS) $@ $«.0 
rm -f $».o 

.c* .a: 

$(GErr) $(GFLflGS) $< 
$(0C) -c ${CFLAGS) $#.c 
$(AR) $(ARFLAGS) $@ $«.o 
rm -f $*.[oo] 

.c.o: 

$(0C) $(CFLAGS) -c $< 

.c*'.o: 

$(GET) $(GELAGS) $< 
$(CX:) $(CFLA3S) -C $«.C 
-rm -f $*.c 

.f .a: 

$(F77) $(F77ELA3S) $(LDFL«5S) -c $*.f 
$(AR) $(ARFL^) S@ $*.o 
-rra -f $*.o 



.c.a: 
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•f.a: 

SiQET) $(GFUCS) $< 

$(B77) $(B77EL«3S) $(IDELAGS) -c $*.f 

$(AR) $(ARFLM3S} S@ $*.o 

-rm -f $*.[fo] 

.f .o: 

$(F77) $(F775Ii^) $(IDEIiAGS) -c $*.f 

.f.o: 

$(G£T} ${GFLfGS) $< 

$(F77) $(F77ELft3S) $(IDFLA3S) -C $*.f 

-m -f $».f 

.s*'.a: 

$(G£rr} $(GFLAGS) $< 
$(AS) ${ASFLAGS) -O $«.0 $^.S 
${m $(ARFLAGS) $@ $«.o 
-rm -f $«.[so] 

.s.o: 

$(AS) $(ASFLAGS) -o $@ $< 

$(GEn') $(CS1ASS) $< 

$(AS} $(ASFL^S) -o $«.0 $#.S 

-rm -f $*.s 

.l.c : 

$(LEK) $(LEIA3S) $< 
lex.yy.c 4@ 

.l-.c: 

$(GET) $(GFL/^) $< 
$(LEX) $(IxFIASS) 
mv lex.yy.c $@ 

.1.0: 

$(LEx) $(lfla;s) $< 

$(0C) $(CFLA3S) -c lex.yy.c 
rm lex.yy.c 
mv lex.yy.o S@ 
-rm -f $#.1 
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$(GEr) $iCFLfCS) $< 

$(LxE2C) $(LFLAGS} 

$(CX:) ${CFLAGS) -c lex.yy.c 

m -f lex.yy.c $*.l 

mv lex.yy.o $*.o 

.y.c : 

${YfiCC) $(YFIASS) $< 
mv y.tab.c $@ 

.y"'.c : 

${GEr) $(mJU3S) $< 
$(YAOC) $(YFLflGS) $».y 
mv y.tab.c $*.c 
-rm -f $*.y 

.y.o: 

$(YAOC) $("XFLAGS) $< 
$(0C) ${CELAGS) -c y.tab.c 
rm y.tab.c 
mv y.tab.o $@ 

.y*.o: 

$(GEr) $(GFLAGS) $< 
$(YAOC) ${YFLAGS) $*.y 
$(0C) $(CFLAGS) -c y.tab.c 
rm -f y.tab.c $*.y 
rav y.tab.o $«.o 



Figure 13-2: make Internal Rules 





make 
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Introduction 



The Source Code Control System (SCCS) is a maintenance and enhance- 
ment tracking tool that runs under the UNIX system. SCCS takes custody 
of a file and, when changes are made, identifies and stores them in the file 
with the original source code and/or documentation. As other changes are 
made, they too are identified and retained in the file. 

Retrieval of the original or any set of changes is possible. Any version 
of the file as it develops can be reconstructed for inspection or additional 
modification. History data can be stored with each version: why the 
changes were made, who made them, when they were made. 

This guide covers the following: 

■ SCCS for Beginners: how to make, retrieve, and update an SCCS file 

■ Delta Numbering: how versions of an SCCS file are named 

■ SCCS Command Conventions: what rules apply to SCCS commands 

■ SCCS Commands: the fourteen SCCS commands and their more use- 
ful arguments 

■ SCCS Files: protection, format, and auditing of SCCS files 

Neither the implementation of SCCS nor the installation procedure for 
SCCS is described in this guide. 



SOURCE CODE CONTROL SYSTEM (SCCS) 14-1 



sees For Beginners 



Several terminal session fragments are presented in this section. Try 
them all. The best way to learn SCCS is to use it. 



Terminology 

A delta is a set of changes made to a file under SCCS custody. To iden- 
tify and keep track of a delta, it is assigned an SID (SCCS IDentification) 
number. The SID for any original file turned over to SCCS is composed of 
release number 1 and level number 1, stated as 1.1. The SID for the first set 
of changes made to that file, that is, its first delta is release 1 version 2, or 
1.2. The next delta would be 1.3, the next 1.4, and so on. More on delta 
numbering later. At this point, it is enough to know that by default SCCS 
assigns SIDs automatically. 



Creating an SCCS Fiie via admin 

Suppose, for example, you have a file called lang that is simply a list of 
five programming language names. Use a text editor to create file lang con- 
taining the following list. 

C 

PL/1 

FORTRAN 

COBOL 

ALGOL 

Custody of your lang file can be given to SCCS using the admin com- 
mand (i.e., administer SCCS file). The following creates an SCCS file from 
the lang file: 

admin — ilang s.lang 

All SCCS files must have names that begin with s., hence s.lang. The -i 
keyletter, together with its value lang, means admin is to create an SCCS 
file and initialize it with the contents of the file lang. 
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The admin command replies 

No id keywords (cm7) 

This is a warning message that may also be issued by other SCCS com- 
mands. Ignore it for now. Its significance is described later with the get 
command under "SCCS Commands." In the following examples, this warn- 
ing message is not shown although it may be issued. 

Remove the lang file. It is no longer needed because it exists now 
under SCCS as s.lang. 

rm lang 



Retrieving a File via get 

Use the get command as follows: 
get sJang 
This retrieves sJang and prints 
1.1 

5 lines 

This tells you that get retrieved version 1.1 of the file, which is made up of 
five lines of text. 

The retrieved text has been placed in a new file known as a "g.file.** 
SCCS forms the g.file name by deleting the prefix s, from the name of the 
SCCS file. Thus, the original lang file has been recreated. 

If you list, ls(l), the contents of your directory, you will see both lang 
and sJang. SCCS retains s.Iang for use by other users. 

The get s.Iang command creates lang as read-only and keeps no infor- 
mation regarding its creation. Because you are going to make changes to it, 
get must be informed of your intention to do so. This is done as follows: 

get — e s.Iang 

get — e causes SCCS to create lang for both reading and writing (edit- 
ing). It also places certain information about lang in another new file, 
called the "p.file" (p.lang in this case), which is needed later by the delta 
command. 
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get -e prints the same messages as get, except that now the SID for the 
first delta you will create is issued: 

1,1 

new delta 1.2 
5 lines 

Change lang by adding two more programming languages: 

SNOBOL 
ADA 



Recording Changes via delta 

Next, use the delta command as follows: 
delta sJang 
delta then prompts with 
cxxnnents? 

Your response should be an explanation of why the changes were made. 
For example, 

added more languages 

delta now reads the p.file, p.Iang, and determines what changes you 
made to lang. It does this by doing its own get to retrieve the original ver- 
sion and applying the di£f(l) command to the original version and the 
edited version. Next, delta stores the changes in sJang and destroys the no 
longer needed p.Iang and lang files. 

When this process is complete, delta outputs 
1.2 

2 inserted 
deleted 
5 unchanged 
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The number L2 is the SID of the delta you just created, and the next 
three lines summarize what was done to s.lang. 



Additional Information about get 

The command, 
get s.lang 

retrieves the latest version of the file s.Iang, now 1.2. SCCS does this by 
starting with the original version of the file and applying the delta you 
made. If you use the get command now, any of the following will retrieve 
version 1.2. 

get s.lang 
get — rl s.lang 
get -rl.2 sJang 

The numbers following — r are SIDs. When you omit the level number 
of the SID (as in get -rl sJang), the default is the highest level number 
that exists within the specified release. Thus, the second command requests 
the retrieval of the latest version in release 1, namely 1.2. The third com- 
mand specifically requests the retrieval of a particular version, in this case 
also 1.2, 

Whenever a major change is made to a file, you may want to signify it 
by changing the release number, the first number of the SID. This, too, is 
done with the get command. 

get — e — r2 sJang 

Because release 2 does not exist, get retrieves the latest version before 
release 2. get also interprets this as a request to change the release number 
of the new delta to 2, thereby naming it 2.1 rather than 1.3. The output is 

1.2 

new delta 2.1 
7 lines 

which means version 1.2 has been retrieved, and 2.1 is the version delta 
will create. If the file is now edited, for example, by deleting COBOL from 
the list of languages, and delta is executed 
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delta s.lang 

cxxments? deleted cobol from list of languages 

you will see by delta's output that version 2.1 is indeed created. 
2.1 

inserted 

1 deleted 

6 imchanged 

Deltas can now be created in release 2 (deltas 2.2, 2.3, etc.), or another 
new release can be created in a similar manner. 



The help Command 

If the command 
get lang 

is now executed, the following message will be output: 

ERPCR [lang]: ncrt: an SCX:s file (c!o1) 

The code col can be used with help to print a fuller explanation of the mes- 
sage. 

help col 

This gives the following explanation of why get lang produced an error 
message: 

C3o1: 

"not an SOCS file" 

A file that you think is an SOCS file 
does not begin vdth the characters "s.", 

help is useful whenever there is doubt about the meaning of almost any 
sees message. 
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Think of deltas as the nodes of a tree in which the root node is the ori- 
ginal version of the file. The root is normally named 1.1 and deltas (nodes) 
are named 1.2, 1,3, etc. The components of these SIDs are called release and 
level numbers, respectively. Thus, normal naming of new deltas proceeds 
by incrementing the level number. This is done automatically by SCCS 
whenever a delta is made. 

Because the user may change the release number to indicate a major 
change, the release number then applies to all new deltas unless specifically 
changed again. Thus, the evolution of a particular file could be represented 
by Figure 14-1. 




Figure 14-1: Evolution of an SCCS File 



This is the normal sequential development of an SCCS file, with each delta 
dependent on the preceding deltas. Such a structure is called the trunk of 
an SCCS tree. 

There are situations that require branching an SCCS tree. That is, 
changes are planned to a given delta that will not be dependent on all pre- 
vious deltas. For example, consider a program in production use at version 
1.3 and for which development work on release 2 is already in progress. 
Release 2 may already have a delta in progress as shown in Figure 14-1. 
Assume that a production user reports a problem in version 1.3 that cannot 
wait to be repaired in release 2. The changes necessary to repair the trouble 
will be applied as a delta to version 1.3 (the version in production use). 
This creates a new version that wiii then be released to the user but will not 
affect the changes being applied for release 2 (i.e., deltas 1.4, 2.1, 2.2, etc.). 
This new delta is the first node of a new branch of the tree. 

Branch delta names always have four SID components: the same release 
number and level number as the trunk delta, plus a branch number and 
sequence number. The format is as follows: 

release JeveLbranch.sequence 
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The branch number of the first delta branching off any trunk delta is 
always 1, and its sequence number is also 1. For example, the full SID for a 
delta branching off trunk delta 1.3 will be 1.3.1.1. As other deltas on that 
same branch are created, only the sequence number changes: 1.3.1.2, 1.3.1.3, 
etc. This is shown in Figure 14-2. 




Figure 14-2: Tree Structure with Branch Deltas 



The branch number is incremented only when a delta is created that 
starts a new branch off an existing branch, as shown in Figure 14-3. As this 
secondary branch develops, the sequence numbers of its deltas are incre- 
mented (1.3.2.1, 1.3.2.2, etc.), but the secondary branch number remains the 
same. 
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0-<E) 




Figure 14-3: Extended Branching Concept 



The concept of branching may be extended to any delta in the tree, and 
the numbering of the resulting deltas proceeds as shown above. SCCS 
allows the generation of complex tree structures. Although this capability 
has been provided for certain specialized uses, the SCCS tree should be kept 
as simple as possible. Comprehension of its structure becomes difficult as 
the tree becomes complex. 
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sees commands accept two types of arguments: 

■ keyletters 

■ filenames 

Keyletters are options that begin with a minus sign, followed by a 
lowercase letter and, in some cases, a value. 

File and /or directory names specify the file(s) the command is to pro- 
cess. Naming a directory is equivalent to naming all the SCCS files within 
the directory. Non-SCCS files and unreadable files (because of permission 
modes via chmod(l)) in the named directories are silently ignored. 

In general, filename arguments may not begin with a minus sign. If a 
filename of — (a lone minus sign) is specified, the command will read the 
standard input (usually your terminal) for lines and take each line as the 
name of an SCCS file to be processed. The standard input is read until 
end-of-file. This feature is often used in pipelines with, for example, the 
commands find(l) or ls(l). 

Keyletters are processed before filenames. Therefore, the placement of 
keyletters is arbitrary— that is, they may be interspersed with filenames. 
Filenames, however, are processed left to right. Somewhat different con- 
ventions apply to help(l), what(l), sccsdi£f(l), and val(l), detailed later 
under "SCCS Commands." 

Certain actions of various SCCS commands are controlled by flags 
appearing in SCCS files. Some of these flags will be discussed, but for a 
complete description see admin(l) in the Programmer's Reference Manual, 

The distinction between real user (see passwd(l)) and effective user will 
be of concern in discussing various actions of SCCS commands. For now, 
assume that the real and effective users are the same— the person logged 
into the UNIX system. 
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x.files and z.files 

All sees commands that modify an SCCS file do so by writing a copy 
called the "x.file." This is done to ensure that the SCCS file is not damaged 
if processing terminates abnormally. SCCS names the x.file by replacing the 
s. of the SCCS filename with x.. The x.file is created in the same directory 
as the SCCS file, given the same mode (see chmod(l)), and is owned by the 
effective user. When processing is complete, the old SCCS file is destroyed 
and the modified x.file is renamed (x. is relaced by s.) and becomes the new 
SCCS file. 

To prevent simultaneous updates to an SCCS file, the same modifying 
commands also create a lock-file called the "z.file." SCCS forms its name by 
replacing the s. of the SCCS filename with a z. prefix. The z.file contains 
the process number of the command that creates it, and its existence 
prevents other commands from processing the SCCS file. The z.file is 
created with access permission mode 444 (read only) in the same directory 
as the SCCS file and is owned by the effective user. It exists only for the 
duration of the execution of the command that creates it. 

In general, users can ignore x.files and z.files. They are useful only in 
the event of system crashes or similar situations. 



Error Messages 

SCCS commands produce error messages on the diagnostic output in 
this format: 

ERROR [name-of-f ile-being-processed] : message text {oode) 

The code in parentheses; can be used as an argument to the help command 
to obtain a further explanation of the message. Detection of a fatal error 
during the processing of a file causes the SCCS command to stop processing 
that file and proceed with the next file specified. 
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This section describes the major features of the fourteen SCCS com- 
mands and their most common arguments. Full descriptions with details of 
all arguments are in the Programmer's Reference Manual 

Here is a quick-reference overview of the commands: 



get retrieves versions of SCCS files 

unget undoes the effect of a get -e prior to the file being deltaed 

delta applies deltas (changes) to SCCS files and creates new ver- 

sions 

admin initializes SCCS files, manipulates their descriptive text, and 
controls delta creation rights 

prs prints portions of an SCCS file in user specified format 

sact prints information about files that are currently out for edit 

help gives explanations of error messages 

rmdel removes a delta from an SCCS file allows removal of deltas 
created by mistake 

cdc changes the commentary associated with a delta 

what searches any UNIX system file(s) for all occurrences of a spe- 
cial pattern and prints out what follows it useful in finding 
identifying information inserted by the get command 

sccsdiff shows differences between any two versions of an SCCS file 

comb combines consecutive deltas into one to reduce the size of 
an SCCS file 

val validates an SCCS file 

vc a filter that may be used for version control 
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The get Command 

The get(l) command creates a file that contains a specified version of an 
sees file. The version is retrieved by beginning with the initial version 
and then applying deltas, in order, until the desired version is obtained. 
The resulting file is called the "g.file." It is created in the current directory 
and is owned by the real user. The mode assigned to the g.file depends on 
how the get command is used. 

The most common use of get is 

get s.abc 

which normally retrieves the latest version of file abc from the SCCS file 
tree trunk and produces (for example) on the standard output 

1.3 

67 lines 

No id kej^rds (cm7) 

meaning version 1.3 of file s-abc was retrieved (assuming 1.3 is the latest 
trunk delta), it has 67 lines of text, and no ID keywords were substituted in 
the file. 

The generated g.file (file abc) is given access permission mode 444 (read 
only). This particular way of using get is intended to produce g.files only 
for inspection, compilation, etc. It is not intended for editing (making del- 
tas). 

When several files are specified, the same information is output for each 
one. For example, 

get s.abc s.xyz 

produces 
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1.3 

67 lines 

No id keywords (an?) 



1.7 

85 lines 

No id keywords (cm7) 




ID Keywords 

In generating a g.file for compilation, it is useful to record the date and 
time of creation, the version retrieved, the module's name, etc. within the 
g.file. This information appears in a load module when one is eventually 
created. SCCS provides a convenient mechanism for doing this automati- 
cally. Identification (ID) keywords appearing anywhere in the generated 
file are replaced by appropriate values according to the definitions of those 
ID keywords. The format of an ID keyword is an uppercase letter enclosed 
by percent signs, %. For example, 

%I% 

is the ID keyword replaced by the SID of the retrieved version of a file. 
Similarly, %H% and %M% are the names of the g.file. Thus, executing get 
on an SCCS file that contains the PL /I declaration, 

DCL ID CHAR(IOO) VAR INIT('%M% %I% %H%'); 

gives (for example) the following: 

DCL ID CHAR(IOO) VAR INIT('MODNAME 2.3 07/18/85'); 

When no ID keywords are substituted by get, the following message is 
issued: 

Nd id kefywords (cm7) 
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This message is normally treated as a warning by get although the pres- 
ence of the i flag in the SCCS file causes it to be treated as an error. For a 
complete list of the approximately twenty ID keywords provided, see get(l) 
in the Programmer's Reference ManuaL 

Retrieval of Different Versions 

The version of an SCCS file get retrieves is the most recently created 
delta of the highest numbered trunk release. However, any other version 
can be retrieved with get — r by specifying the version's SID. Thus, 

get -rl.3 s^abc 

retrieves version 1.3 of file s.abc and produces (for example) on the standard 
output 

1.3 

64 lines 

A branch delta may be retrieved similarly, 

get -rl,5.2.3 s.abc 

which produces (for example) on the standard output 

1.5.2.3 
234 lines 

When a SID is specified and the particular version does not exist in the 
SCCS file, an error message results. 

Omitting the level number, as in 

get — r3 s.abc 

causes retrieval of the trunk delta with the highest level number within the 
given release. Thus, the above command might output, 

3,7 

213 lines 

If the given release does not exist, get retrieves the trunk delta with the 
highest level number within the highest-numbered existing release that is 
lower than the given release. For example, assume release 9 does not exist 
in file s.abc and release 7 is the highest-numbered release below 9. Execut- 
ing 
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get — r9 s.abc 

might produce 
7.6 

420 lines 

which indicates that trunk delta 7.6 is the latest version of file s.abc below 
release 9. Similarly, omitting the sequence number, as in 

get -r4<3.2 s-abc 

results in the retrieval of the branch delta with the highest sequence 
number on the given branch. (If the given branch does not exist, an error 
message results.) This might result in the following output: 

4.3.2.8 

89 lines 

get -t will retrieve the latest (top) version of a particular release when 
no — r is used or when its value is simply a release number. The latest ver- 
sion is the delta produced most recently, independent of its location on the 
sees file tree. Thus, if the most recent delta in release 3 is 3.5, 

get -r3 -t s.abc 

might produce 
3.5 

59 lines 

However, if branch delta 3.2.1.5 were the latest delta (created after delta 
3.5), the same command might produce 

3.2.1.5 
46 lines 

Retrieval With Intent to Make a Delta 

get — e indicates an intent to make a delta. First, get checks the follow- 
ing. 

1. The user list to determine if the login name or group ID of the per- 
son executing get is present. The login name or group ID must be 
present for the user to be allowed to make deltas. (See "The admin 
Command" for a discussion of making user lists.) 
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2. The release number (R) of the version being retrieved satisifies the 
relation 

floor is less than or equal to R, which is 
less than or equal to ceiling 

to determine if the release being accessed is a protected release. The 
floor and ceiling are flags in the SCCS file representing start and 
end of range. 

3. The R is not locked against editing. The lock is a flag in the SCCS 
file. 

4. Whether multiple concurrent edits are allowed for the SCCS file by 
the j flag in the SCCS file, 

A failure of any of the first three conditions causes the processing of the 
corresponding SCCS file to terminate. 

If the above checks succeed, get -e causes the creation of a g.file in the 
current directory with mode 644 (readable by everyone, writable only by 
the owner) owned by the real user. If a writable g.file already exists, get 
terminates with an error. This is to prevent inadvertent destruction of a 
g.file being edited for the purpose of making a delta. 

Any ID keywords appearing in the g.file are not substituted by get — e 
because the generated g.file is subsequently used to create another delta. 
Replacement of ID keywords causes them to be permanently changed in the 
SCCS file. Because of this, get does not need to check for their presence in 
the g.file. Thus, the message 

No id keywords (cm?) 

is never output when get — e is used. 

In addition, get — e causes the creation (or updating) of a p.file that is 
used to pass information to the delta command. 

The following 

get — e s.abc 

produces (for example) on the standard output 

1.3 

new delta 1.4 
67 lines 
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Undoing a get — e 

There may be times when a file is retrieved for editing in error; there is 
really no editing that needs to be done at this time. In such cases, the 
unget command can be used to cancel the delta reservation that was set up. 

Additional get Options 

If get — r and /or — t are used together with — e, the version retrieved for 
editing is the one specified with — r and /or — t. 

get -i and -x are used to specify a list (see get(l) in the Programmer's 
Reference Manual for the syntax of such a list) of deltas to be included and 
excluded, respectively. Including a delta means forcing its changes to be 
included in the retrieved version. This is useful in applying the same 
changes to more than one version of the SCCS file. Excluding a delta 
means forcing it not to be applied. This may be used to undo the effects of 
a previous delta in the version to be created. 

Whenever deltas are included or excluded, get checks for possible 
interference with other deltas. Two deltas can interfere, for example, when 
each one changes the same line of the retrieved g.file. A warning shows 
the range of lines within the retrieved g.file where the problem may exist. 
The user should examine the g.file to determine what the problem is and 
take appropriate corrective steps (e.g., edit the file). 



get — k is used either to regenerate a g.file that may have been acciden- 
tally removed or ruined after get — e, or simply to generate a g,file in which 
the replacement of ID keywords has been suppressed. A g.file generated by 
get — k is identical to one produced by get — e, but no processing related to 
the p.file takes place. 

Concurrent Edits of Different SID 

The ability to retrieve different versions of an SCCS file allows several 
deltas to be in progress at any given time. This means that several get — e 
commands may be executed on the same file as long as no two executions 
retrieve the same version (unless multiple concurrent edits are allowed). 




get — i and get — x should be used with extreme care. 
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The p. file created by get — e is named by automatic replacement of the 
sees filename's prefix s. with p.. It is created in the same directory as the 
sees file, given mode 644 (readable by everyone, writable only by the 
owner), and owned by the effective user. The p.file contains the following 
information for each delta that is still in progress: 

D the SID of the retrieved version 

■I the SID given to the new delta when it is created 

B the login name of the real user executing get 

The first execution of get — e causes the creation of a p.file for the 
corresponding SeeS file. Subsequent executions only update the p.file with 
a line containing the above information. Before updating, however, get 
checks to assure that no entry already in the p.file specifies that the SID of 
the version to be retrieved is already retrieved (unless multiple concurrent 
edits are allowed). If the check succeeds, the user is informed that other 
deltas are in progress and processing continues. If the check fails, an error 
message results. 

It should be noted that concurrent executions of get must be carried out 
from different directories. Subsequent executions from the same directory 
will attempt to overwrite the g.file, which is an SCeS error condition. In 
practice, this problem does not arise since each user normally has a different 
working directory. See "Protection" under "SeeS Files" for a discussion of 
how different users are permitted to use SeeS commands on the same files. 

Figure 14-4 shows the possible SID components a user can specify with 
get (left-most column), the version that will then be retrieved by get, and 
the resulting SID for the delta, which delta will create (right-most column). 



SID 
Specified 
in get* 


-b Key- 
Letter 
Usedt 


Other 
eonditions 


SID 
Retrieved 
by get 


SID of Delta 
To be ereated 
by delta 


none:): 


no 


R defaults to mR 


mR.mL 


mR.(mL+l) 


none:]: 


yes 


R defaults to mR 


mR.mL 


mR.mL.(mB+l) 


R 


no 


R > mR 


mR.mL 


R.l§ 


R 


no 


R = mR 


mR.mL 


mR.(mL+l) 
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SID 
Specified 
in get* 


— b Key- 
Letter 
Usedt 


Other 
Conditions 


Retrieved 
by get 


To be Created 
by delta 


R 


yes 


R > mR 


mR.mL 


mR.mL.(mB+l).l 


R 


yes 


R « mR 


mR.mL 


mR.mL.(mB+l).l 


R 




R< mR and R 
does not exist 


hR.mL** 


hR.mL.(mB+l).l 


R 


- 


Trunk successor 
number in 
release > R 
and R exists 


R.mL 


R.mL.(mB+l).l 


R.L. 


no 


No trunk 
successor 


R.L 


R.(L+1) 


R.L. 


yes 


No trunk 
successor 


R.L 


R.L.(mB+l).l 


R.L 


- 


Trunk successor 
in release > R 


R.L 


R.L.(mS+l).l 


R.L.B 


no 


No branch 
successor 


R.L.B.mS 


R.L.B.(mS+l) 


R.L.B 


yes 


No branch 
successor 


RX.B.mS 


R.L.(mB+l).l 


RX.B.S 


no 


No branch 
successor 


R.L.B.S 


R.L.B.(S+1) 


R.L.B.S 


yes 


No branch 
successor 


RX.B.S 


R.L.(mB+l).l 


R.L.B.S 




Branch successor 


R.L.B.S 


R.L.(mB+l).l 



Figure 14-4: Determination of New SID 



R, U B, and S mean release, level, branch, and sequence numbers in the SID, and m 
means maximum. Thus, for example, R.mL means the maximum level number within 
release R. R.L.(mB+l).l means the first sequence number on the new branch (i.e., 
maximum branch number plus 1) of level L within release R. Note that if the SID 
specified is R.L, R.L.B, or R.L.B.S, each of these specified SID numbers must exist. 
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t The -b keyletter is effective only if the b flag (see adinin(l)) is present in the file. 
An entry of - means irrelevant. 

I This case applies if the d (default SID) flag is not present. If the d flag is present in 
the file, the SID is interpreted as if specified on the command line. Thus, one of the 
other cases in this figure applies. 

§ This is used to force the creation of the first delta in a new release. 

hR is the highest existing release that is lower than the specified, nonexistent release 
R. 

Concurrent Edits of Same SID 

Under normal conditions, more than one get — e for the same SID is not 
permitted. That is, delta must be executed before a subsequent get — e is 
executed on the same SID. 

Multiple concurrent edits are allowed if the j flag is set in the SCCS file. 
Thus: 

get — e s.abc 
1.1 

new delta 1.2 
5 lines 

may be immediately followed by 

get — e s.abc 
1.1 

new delta 1.1.1.1 
5 lines 

without an intervening delta. In this case, a delta after the first get will 
produce delta 1.2 (assuming 1.1 is the most recent trunk delta), and a delta 
after the second get will produce delta 1.1.1.1. 

Keyletters That Affect Output 

get — p causes the retrieved text to be written to the standard output 
rather than to a g.file. In addition, all output normally directed to the stan- 
dard output (such as the SID of the version retrieved and the number of 
lines retrieved) is directed instead to the diagnostic output, get — p is used, 
for example, to create a g.file with an arbitrary name, as in 

get — p s.abc > arbitrary-file-name 
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get -s suppresses output normally directed to the standard output, such 
as the SID of the retrieved version and the number of lines retrieved, but it 
does not affect messages normally directed to the diagnostic output, get -s 
is used to prevent nondiagnostic messages from appearing on the user's ter- 
minal and is often used with — p to pipe the output, as in 

get -p -s s.abc | pg 

get -g suppresses the retrieval of the text of an SCCS file. This is useful 
in several ways. For example, to verify a particular SID in an SCCS file 

get -g -r4,3 s.abc 

outputs the SID 4.3 if it exists in the SCCS file s.abc or an error message if it 
does not. Another use of get -g is in regenerating a p.file that may have 
been accidentally destroyed, as in 

get -e -g s.abc 

get -1 causes SCCS to create an "Lfile." It is named by replacing the s. 
of the SCCS filename with 1., created in the current directory with mode 
444 (read only) and owned by the real user. The Lfile contains a table 
(whose format is described under get(l) in the Programmer's Reference 
Manual) showing the deltas used in constructing a particular version of the 
SCCS file. For example 

get -r2.3 -1 s.abc 

generates an l.file showing the deltas applied to retrieve version 2.3 of file 
s.abc. Specifying p with —1, as in 

get -Ip -r2.3 s,abc 

causes the output to be written to the standard output rather than to the 
l.file. get — g can be used with —1 to suppress the retrieval of the text. 

get -m identifies the changes applied to an SCCS file. Each line of the 
g.file is preceded by the SID of the delta that caused the line to be inserted. 
The SID is separated from the text of the line by a tab character. 

get — n causes each line of a g.file to be preceded by the value of the 
%M% ID keyword and a tab character. This is most often used in a pipeline 
with grep(l). For example, to find all lines that match a given pattern in 
the latest version of each SCCS file in a directory, the following may be exe- 
cuted: 
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get — p — n — s directory \ grep pattern 

If both — m and — n are specified, each line of the generated g.file is pre- 
ceded by the value of the %M% ID keyword and a tab (this is the effect of 
— n) and is followed by the line in the format produced by — m. Because use 
of — m and/or — n causes the contents of the g.file to be modified, such a 
g.file must not be used for creating a delta. Therefore, neither — m nor — n 
may be specified together with get -e. 



NOTE 



See get(l) in the Programmer's Reference Manual for a full description of 
additional keyletters. 



The delta Command 

The delta(l) command is used to incorporate changes made to a g.file 
into the corresponding SCCS file— that is, to create a delta and, therefore, a 
new version of the file. 

The delta command requires the existence of a p. file (created via get 
— e). It examines the p. file to verify the presence of an entry containing the 
user's login name. If none is found, an error message results. 

get — e performs. If all checks are successful, delta determines what has 
been changed in the g.file by comparing it via diff(l) with its own tem- 
porary copy of the g.file as it was before editing. This temporary copy of 
the g.file is called the d.file and is obtained by performing an internal get 
on the SID specified in the p.file entry. 

The required p.file entry is the one containing the login name of the 
user executing delta, because the user who retrieved the g.file must be the 
one who creates the delta. However, if the login name of the user appears 
in more than one entry, the same user has executed get -e more than once 
on the same SCCS file. Then, delta -r must be used to specify the SID that 
uniquely identifies the p.file entry. This entry is then the one used to 
obtain the SID of the delta to be created. 
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In practice, the most common use of delta is 
delta s.abc 
which prompts 

ooraments? 

to which the user replies with a description of why the delta is being made, 
ending the reply with a newline character. The user's response may be up 
to 512 characters long with newlines (not intended to terminate the 
response) escaped by backslashes, \. 

If the sees file has a v flag, delta first prompts with 

MRS? 

(Modification Requests), on the standard output. The standard input is then 
read for MR numbers, separated by blanks and /or tabs, ended with a new- 
line character. A Modification Request is a formal way of asking for a 
correction or enhancement to the file. In some controlled environments 
where changes to source files are tracked, deltas are permitted only when 
initiated by a trouble report, change request, trouble ticket, etc., collectively 
called MRs. Recording MR numbers within deltas is a way of enforcing the 
rules of the change management process. 

delta — y and /or — m can be used to enter comments and MR numbers 
on the command line rather than through the standard input, as in 

delta —y'^ descriptive comment^* —m"mrnuml mrnumX^ s.abc 



In this case, the prompts for comments and MRs are not printed, and 
the standard input is not read. These two keyletters are useful when delta 
is executed from within a shell procedure (see sh(l) in the Programmer's 
Reference Manual), 



NOTE 



delta — m is allowed only if the SCCS file has a v flag. 



No matter how comments and MR numbers are entered with delta, they 
are recorded as part of the entry for the delta being created. Also, they 
apply to all SCCS files specified with the delta. 
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If delta is used with more than one file argument and the first file 
named has a v flag, all files named must have this flag. Similarly, if the first 
file named does not have the flag, none of the files named may have it. 

When delta processing is complete, the standard output displays the SID 
of the new delta (from the p.file) and the number of lines inserted, deleted, 
and left unchanged. For example: 

1.4 

14 iiise3:±ed 
7 deleted 
345 unchanged 

If line counts do not agree with the user's perception of the changes 
made to a g-file, it may be because there are various ways to describe a set 
of changes, especially if lines are moved around in the g.file. However, the 
total number of lines of the new delta (the number inserted plus the 
number left unchanged) should always agree with the number of lines in 
the edited g.file. 

If you are in the process of making a delta, the delta command finds no 
ID keywords in the edited g.file, the message 

No id keywords (on?) 

is issued after the prompts for commentary but before any other output. 
This means that any ID keywords that may have existed in the SCCS file 
have been replaced by their values or deleted during the editing process. 
This could be caused by making a delta from a g.file that was created by a 
get without — e (ID keywords are replaced by get in such a case). It could 
also be caused by accidentally deleting or changing ID keywords while edit- 
ing the g.file. Or, it is possible that the file had no ID keywords. In any 
case, the delta will be created unless there is an i flag in the SCCS file 
(meaning the error should be treated as fatal), in which case the delta will 
not be created. 

After the processing of an SCCS file is complete, the corresponding 
p.file entry is removed from the p.file. All updates to the p.file are made to 
a temporary copy, the "q.file," whose use is similar to the use of the x.file 
described earlier under "SCCS Command Conventions." If there is only one 
entry in the p.file, then the p.file itself is removed. 
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In addition, delta removes the edited g.file unless — n is specified. For 
example 

delta — n s.abc 

will keep the g.file after processing. 

delta -s suppresses all output normally directed to the standard output, 
other than ocninents? and MRs?. Thus, use of -s with -y (and/or — m) 
causes delta to neither read the standard input nor write the standard out- 
put. 

The differences between the g.file and the d.file constitute the delta and 
may be printed on the standard output by using delta -p. The format of 
this output is similar to that produced by diff(l). 



The admin Command 

The admin(l) command is used to administer SCCS files— that is, to 
create new SCCS files and change the parameters of existing ones. When an 
SCCS file is created, its parameters are initialized by use of keyletters with 
admin or are assigned default values if no keyletters are supplied. The 
same keyletters are used to change the parameters of existing SCCS files. 

Two keyletters are used in detecting and correcting corrupted SCCS files 
(see "Auditing" under "SCCS Files"). 

Newly created SCCS files are given access permission mode 444 (read 
only) and are owned by the effective user. Only a user with write permis- 
sion in the directory containing the SCCS file may use the admin command 
on that file. 



Creation of SCCS Files 

An SCCS file can be created by executing the command 

admin — ifirst s.abc 

in which the value first with -i is the name of a file from which the text of 
the initial delta of the SCCS file s.abc is to be taken. Omission of a value 
with — i means admin is to read the standard input for the text of the initial 
delta. 
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The command 

admin — i s»abc < first 

is equivalent to the previous example. 

If the text of the initial delta does not contain ID keywords, the message 

No id keyvrords (on?) 

is issued by admin as a warning. However, if the command also sets the i 
flag (not to be confused with the — i keyletter), the message is treated as an 
error and the SCCS file is not created. Only one SCCS file may be created 
at a time using admin — i. 

admin -r is used to specify a release number for the first delta. Thus: 

admin -ifirst -r3 s.abc 

means the first delta should be named 3.1 rather than the normal LI. 
Because — r has meaning only when creating the first delta, its use is permit- 
ted only with — i. 

Inserting Commentary for the Initial Delta 

When an SCCS file is created, the user may want to record why this was 
done. Comments (admin — y) and/or MR numbers (~m) can be entered in 
exactly the same way as a delta. 

If — y is omitted, a comment line of the form 

date and time created YY/MM/DD HH:MM:SS by logname 

is automatically generated. 

If it is desired to supply MR numbers (admin — m), the v flag must be 
set via — f . The v flag simply determines whether MR numbers must be 
supplied when using any SCCS command that modifies a delta commentary 
(see sccsfile(4) in the Programmer's Reference Manual) in the SCCS file. Thus: 

admin —ifirst --mmrnuml — fv s.abc 

Note that — y and — m are effective only if a new SCCS file is being created. 

Initialization and Modification of SCCS File Parameters 

Part of an SCCS file is reserved for descriptive text, usually a summary 
of the file's contents and purpose. It can be initialized or changed by using 
admin — t. 
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When an SCCS file is first being created and — t is used, it must be fol- 
lowed by the name of a file from which the descriptive text is to be taken. 
For example, the command 

admin — ifirst — tdesc s.abc 

specifies that the descriptive text is to be taken from file desc. 

When processing an existing SCCS file, — t specifies that the descriptive 
text (if any) currently in the file is to be replaced with the text in the named 
file. Thus: 

admin -tdesc s»abc 

specifies that the descriptive text of the SCCS file is to be replaced by the 
contents of desc. Omission of the filename after the — t keyletter as in 

admin — t s.abc 

causes the removal of the descriptive text from the SCCS file. 

The flags of an SCCS file may be initialized or changed by admin — f, or 
deleted via — d. 

SCCS file flags are used to direct certain actions of the various com- 
mands. (See admin(l) in the Programmer's Reference Manual for a description 
of all the flags.) For example, the i flag specifies that a warning message 
(stating that there are no ID keywords contained in the SCCS file) should be 
treated as an error. The d (default SID) flag specifies the default version of 
the SCCS file to be retrieved by the get command. 

admin -f is used to set flags and, if desired, their values. For example 

admin -ifirst -fi -fmmodname s.abc 

sets the i and m (module name) flags. The value modname specified for the 
m flag is the value that the get command will use to replace the %M% ID 
keyword. (In the absence of the m flag, the name of the g.file is used as the 
replacement for the %M% ID keyword.) Several -f keyletters may be sup- 
plied on a single admin, and they may be used whether the command is 
creating a new SCCS file or processing an existing one. 

admin — d is used to delete a flag from an existing SCCS file. As an 
example, the command 

admin —dm s.abc 

removes the m flag from the SCCS file. Several -d keyletters may be used 
with one admin and may be intermixed with — f . 
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sees files contain a list of login names and/or group IDs of users who 
are allowed to create deltas. This list is empty by default, allowing anyone 
to create deltas. To create a user list (or add to an existing one), admin —a 
is used. For example, 

admin — axyz — awql — al234 s.abc 

adds the login names xyz and wql and the group ID 1234 to the list, admin 
—a may be used whether creating a new SCCS file or processing an existing 
one. 

admin — e (erase) is used to remove login names or group IDs from the 

list. 



The prs Command 

The prs(l) command is used to print all or part of an SCCS file on the 
standard output. If prs — d is used, the output will be in a format called 
data specification. Data specification is a string of SCCS file data keywords 
(not to be confused with get ID keywords) interspersed with optional user 
text. 

Data keywords are replaced by appropriate values according to their 
definitions. For example, 

:I: 

is defined as the data keyword replaced by the SID of a specified delta. 
Similarly, :F: is the data keyword for the SCCS filename currently being 
processed, and :C: is the comment line associated with a specified delta. All 
parts of an SCCS file have an associated data keyword. For a complete list, 
see prs(l) in the Programmer's Reference Manual. 

There is no limit to the number of times a data keyword may appear in 
a data specification. Thus, for example, 

prs -d":!: this is the top delta for :F: s.abc 

may produce on the standard output 

2,1 this is the top delta for s.abc 2.1 



SOURCE CODE CONTROL SYSTEM (SCCS) 14-29 



sees Commands 

Information may be obtained from a single delta by specifying its SID 
using prs — r. For example, 

prs — d":F:: :I: comment line is: :C:" -rl.4 s.abc 

may produce the following output: 

s.abc: 1,4 comment line is: THIS IS A COMMENT 

If — r is not specified, the value of the SID defaults to the most recently 
created delta. 

In addition, information from a range of deltas may be obtained with — 1 
or — e. The use of prs — e substitutes data keywords for the SID designated 
via — r and all deltas created earlier, while prs —1 substitutes data keywords 
for the SID designated via — r and all deltas created later. Thus, the com- 
mand 

prs — d:I: — rl,4 — e s,abc 
may output 





1.3 

1.2.1.1 



1.2 
1.1 





and the command 



prs — d:I: —rl.4 —1 s.abc 



may produce 
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3.2 
3.1 



2.2.1.1 



2.2 
2.1 
1.4 




Substitution of data keywords for all deltas of the SCCS file may be 
obtained by specifying both — e and —1. 



The sact Command 

sact(l) is like a special form of the prs command that produces a report 
about files that are out for edit. The command takes only one type of argu- 
ment: a list of file or directory names. The report shows the SID of any file 
in the list that is out for edit, the SID of the impending delta, the login of 
the user who executed the get — e command, and the date and time the get 
— e was executed. It is a useful command for an administrator. 



The help Command 

The help(l) command prints the syntax of SCCS commands and of mes- 
sages that may appear on the user's terminal. Arguments to help are simply 
SCCS commands or the code numbers that appear in parentheses after SCCS 
messages. (If no argument is given, help prompts for one.) Explanatory 
information is printed on the standard output. If no information is found, 
an error message is printed. When more than one argument is used, each is 
processed independently, and an error resulting from one will not stop the 
processing of the others. For more information, see help(l) in the UNIX 
System V User's Reference Manual 
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Explanatory information related to a command is a synopsis of the com* 
mand. For example, 

help ge5 rmdel 

produces 




"nonexistent sid" 

The specified sid does not exist in the 
given file. 
Check for typos. 



mdel: 



rndel -rSID name . 




The rmdel Command 

The rmdel(l) command allows removal of a delta from an SCCS file. Its 
use should be reserved for deltas in which incorrect global changes were 
made. The delta to be removed must be a leaf delta. That is, it must be the 
most recently created delta on its branch or on the trunk of the SCCS file 
tree. In Figure 14-3, only deltas 1.3.1.2, 1.3.2.2, and 2.2 can be removed. 
Only after they are removed can deltas 1.3.2.1 and 2.1 be removed. 

To be allowed to remove a delta, the effective user must have write per- 
mission in the directory containing the SCCS file. In addition, the real user 
must be either the one who created the delta being removed or the owner 
of the SCCS file and its directory. 

The — r keyletter is mandatory with rmdel. It is used to specify the 
complete SID of the delta to be removed. Thus, 

rmdel — r2,3 s.abc 

specifies the removal of trunk delta 2.3. 
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Before removing the delta, rmdel checks that the release number (R) of 
the given SID satisfies the relation: 

floor less than or equal to R less than or equal to ceiling 

The rmdel command also checks the SID to make sure it is not for a ver- 
sion on which a get for editing has been executed and whose associated 
delta has not yet been made. In addition, the login name or group ID of 
the user must appear in the file's user list (or the user list must be empty). 
Also, the release specified cannot be locked against editing. That is, if the 1 
flag is set (see admin(l) in the Programmer's Reference Manual), the release 
must not be contained in the list. If these conditions are not satisfied, pro- 
cessing is terminated, and the delta is not removed. 

Once a specified delta has been removed, its type indicator in the delta 
table of the SCCS file is changed from D (delta) to R (removed). 

The cdc Command 

The cdc(l) command is used to change the commentary made when the 
delta was created. It is similar to the rmdel command (e.g., — r and full SID 
are necessary), although the delta need not be a leaf delta. For example, 

cdc — r3.4 s.abc 

specifies that the commentary of delta 3.4 is to be changed. New commen- 
tary is then prompted for as with delta. 

The old commentary is kept, but it is preceded by a comment line indi- 
cating that it has been superseded, and the new commentary is entered 
ahead of the comment line. The inserted comment line records the login 
name of the user executing cdc and the time of its execution. 

The cdc command also allows for the insertion of new and deletion of 
old ("!" prefix) MR numbers. Thus, 

cdc — rl.4 s.abc 

ms? mrnumS Imrnuml (The MRs? prompt appears only 

if the V flag has been set.) 
coninents? deleted wrong MR number and inserted correct MR number 

inserts mrnum3 and deletes mrnuml for delta 1.4. 
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An MR (Modification Request) is described above under the delta com- 
NOTE mand. 



The what Command 

The what(l) command is used to find identifying information within 
any UNIX file whose name is given as an argument. No keyletters are 
accepted. The what command searches the given file(s) for all occurrences 
of the string @(#), which is the replacement for the %Z% ID keyword (see 
get(l)). It prints on the standard output whatever follows the string until 
the first double quote, greater than, >, backslash, \, newline, or nonprint- 
ing NUL character. 

For example, if an SCCS file called s.prog.c (a C language program) con- 
tains the following line: 

char id[ ]= "%W%"; 

and the command 

get — r3.4 s.prog.c 

is used, the resulting g.file is compiled to produce prog.o and a.out. Then, 
the command 

what prog.c prog.o a.out 

produces 
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prog.c: 




prog.c: 


3.4 


prog.o: 




prog.c: 


3.4 


a. out: 




poDog.c: 


3.4 



The string searched for by what need not be inserted via an ID keyword of 
get; it may be inserted in any convenient manner. 



The sccsdiff Command 

The sccsdiff(l) command determines (and prints on the standard out- 
put) the differences between any two versions of an SCCS file. The versions 
to be compared are specified with sccsdiff -r in the same way as with get 
— r. SID numbers must be specified as the first two arguments. Any follow- 
ing key letters are interpreted as arguments to the pr(l) command (which 
prints the differences) and must appear before any filenames. The SCCS 
file(s) to be processed are named last. Directory names and a name of — (a 
lone minus sign) are not acceptable to sccsdiff. 

The following is an example of the format of sccsdiff: 

sccsdiff — r3.4 — r5.6 s.abc 

The differences are printed the same way as by diff(l). 



The comb Command 

The comb(l) command lets the user try to reduce the size of an SCCS 
file. It generates a shell procedure (see sh(l) in the Programmer's Reference 
Manual) on the standard output, which reconstructs the file by discarding 
unwanted deltas and combining other specified deltas. (It is not recom- 
mended that comb be used as a matter of routine.) 
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In the absence of any keyletters, comb preserves only leaf deltas and 
the minimum number of ancestor deltas necessary to preserve the shape of 
an sees tree. The effect of this is to eliminate middle deltas on the trunk 
and on all branches of the tree. Thus, in Figure 14-3, deltas 1.2, 1.3.2.1, 1.4, 
and 2.1 would be eliminated. 

Some of the keyletters used with this command are: 

comb — s This option generates a shell procedure that produces a 
report of the percentage space (if any) the user will save. 
This is often useful as an advance step. 

comb — p This option is used to specify the oldest delta the user wants 
preserved. 

comb — c This option is used to specify a list (see get(l) in the 

Programmer's Reference Manual for its syntax) of deltas the 
user wants preserved. All other deltas will be discarded. 

The shell procedure generated by comb is not guaranteed to save space. A 
reconstructed file may even be larger than the original. Note, too, that the 
shape of an SCCS file tree may be altered by the reconstruction process. 



The val Command 

The val(l) command is used to determine whether a file is an SCCS file 
meeting the characteristics specified by certain keyletters. It checks for the 
existence of a particular delta when the SID for that delta is specified with 
— r. 

The string following — y or — m is used to check the value set by the t or 
m flag, respectively. See admin(l) in the Programmer's Reference Manual for 
descriptions of these flags. 

The val command treats the special argument — differently from other 
SCCS commands. It allows val to read the argument list from the standard 
input instead of from the command line, and the standard input is read 
until an end-of-file (CTRL-D) is entered. This permits one val command 
with different values for keyletters and file arguments. For example, 

val yc -mabc s^abc — mxyz -ypll s.xyz 

first checks if file s.abc has a value c for its type flag and value abc for the 
module name flag. Once this is done, val processes the remaining file, in 
this case s.xyz. 
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The val command returns an 8-bit code. Each bit set shows a specific 
error (see val(l) for a description of errors and codes). In addition, an 
appropriate diagnostic is printed unless suppressed by -s. A return code of 
means all files met the characteristics specified. 



The vc Command 

The vc(l) command is an awk-like tool used for version control of sets 
of files. While it is distributed as part of the SCCS package, it does not 
require the files it operates on to be under SCCS control. A complete 
description of vc may be found in the Programmer's Reference ManuaL 



SOURCE CODE CONTROL SYSTEM (SCCS) 14-37 



sees Files 



This section covers protection mechanisms used by SCCS, the format of 
sees files, and the recommended procedures for auditing SCCS files. 



Protection 

sees relies on the capabilities of the UNIX system for most of the pro- 
tection mechanisms required to prevent unauthorized changes to SeeS 
files— that is, changes by non-SeeS commands. Protection features pro- 
vided directly by SCeS are the release lock flag, the release floor and ceiling 
flags, and the user list. 

Files created by the admin command are given access permission mode 
444 (read only). This mode should remain unchanged because it prevents 
modification of SeeS files by non-SeeS commands. Directories containing 
sees files should be given mode 755, which allows only the owner of the 
directory to modify it. 

sees files should be kept in directories that contain only SeeS files and 
any temporary files created by SeeS commands. This simplifies their pro- 
tection and auditing. The contents of directories should be logical 
groupings— subsystems of the same large project, for example. 

sees files should have only one link (name) because commands that 
modify them do so by creating a copy of the file (the x.file; see "SeeS eom- 
mand eonventions"). When processing is done, the old file is automatically 
removed and the x.file renamed (s. prefix). If the old file had additional 
links, this breaks them. Then, rather than process such files, SeeS com- 
mands will produce an error message. 

When only one person uses SeeS, the real and effective user IDs are the 
same; and the user ID owns the directories containing SeeS files. There- 
fore, sees may be used directly without any preliminary preparation. 

When several users with unique user IDs are assigned SeeS responsibil- 
ities (e.g., on large development projects), one user— that is, one user ID- 
must be chosen as the owner of the SeeS files. This person will administer 
the files (e.g. use the admin command) and will be SeeS administrator for 
the project. Because other users do not have the same privileges and per- 
missions as the sees administrator, they are not able to execute directly 
those commands that require write permission in the directory containing 
the sees files. Therefore, a project-dependent program is required to 
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provide an interface to the get, delta, and, if desired, rmdel and cdc com- 
mands. 

The interface program must be owned by the SCCS administrator and 
must have the set user ID on execution bit on (see chinod(l) in the Use/s 
Reference Manual). This assures that the effective user ID is the user ID of 
the SCCS administrator. With the privileges of the interface program dur- 
ing command execution, the owner of an SCCS file can modify it at will. 
Other users whose login names or group IDs are in the user list for that file 
(but are not the owner) are given the necessary permissions only for the 
duration of the execution of the interface program. Thus, they may modify 
SCCS only with delta and, possibly, rmdel and cdc. 

A project-dependent interface program, as its name implies, can be cus- 
tom built for each project. Its creation is discussed later under "An SCCS 
Interface Program." 



Formatting 

sees files are composed of lines of ASCII text arranged in six parts as 
follows: 

Checksum a line containing the logical sum of all the characters 

of the file (not including the checksum itself) 

Delta Table information about each delta, such as type, SID, date 

and time of creation, and commentary 

User Names list of login names and /or group IDs of users who are 
allowed to modify the file by adding or removing del- 
tas 

Flags indicators that control certain actions of SCCS com- 

mands 

Descriptive Text usually a summary of the contents and purpose of the 
file 



Body 



the text administered by SCCS, intermixed with inter- 
nal SCCS control lines 
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Details on these file sections may be found in sccsfile(4). The checksum 
is discussed below under "Auditing." 

Since SCCS files are ASCII files they can be processed by non-SCCS 
commands like ed{l), grep(l), and cat(l). This is convenient when an SCCS 
file must be modified manually (e.g., a delta's time and date were recorded 
incorrectly because the system clock was set incorrectly), or when a user 
wants simply to look at the file. 



Extreme care should be exercised when modifying SCCS files with non- 
SCCS commands. 




Auditing 

When a system or hardware malfunction destroys an SCCS file, any 
command will issue an error message. Commands also use the checksum 
stored in an SCCS file to determine whether the file has been corrupted 
since it was last accessed (possibly by having lost one or more blocks or by 
having been modified with ed(l)). No SCCS command will process a cor- 
rupted SCCS file except the admin command with — h or —z, as described 
below. 

SCCS files should be audited for possible corruptions on a regular basis. 
The simplest and fastest way to do an audit is to use admin — h and specify 
all SCCS files: 

admin — h s^filel s*file2 ... 
or 

admin — h directoryl directory! ... 

If the new checksum of any file is not equal to the checksum in the first 
line of that file, the message 

cxDrrr^jtied file (006) 

is produced for that file. The process continues until all specified files have 
been examined. When examining directories (as in the second example 
above), the checksum process will not detect missing files. A simple way to 
learn whether files are missing from a directory is to execute the ls(l) 
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command periodically, and compare the outputs. Any file whose name 
appeared in a previous output but not in the current one no longer exists. 

When a file has been corrupted, the way to restore it depends on the 
extent of the corruption. If damage is extensive, the best solution is to con- 
tact the local UNIX system operations group and request that the file be 
restored from a backup copy. If the damage is minor, repair through edit- 
ing may be possible. After such a repair, the admin command must be exe- 
cuted: 

admin — z s.file 

The purpose of this is to recompute the checksum and bring it into 
agreement with the contents of the file. After this command is executed, 
any corruption that existed in the file will no longer be detectable. 
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Introduction 



This chapter describes the symbolic debugger, sdb(l), as implemented 
for C language and Fortran 77 programs on the UNIX operating system. 
The sdb program is useful both for examining core images of aborted pro- 
grams and for providing an environment in which execution of a program 
can be monitored and controlled. 

The sdb program allows interaction with a debugged program at the 
source language level. When debugging a core image from an aborted pro- 
gram, sdb reports which line in the source program caused the error and 
allows all variables to be accessed symbolically and to be displayed in the 
correct format. 

When executing, breakpoints may be placed at selected statements or 
the program may be single stepped on a line-by-line basis. To facilitate 
specification of lines in the program without a source listing, sdb provides a 
mechanism for examining the source text. Procedures may be called 
directly from the debugger. This feature is useful both for testing indivi- 
dual procedures and for calling user-provided routines, which provide for- 
matted printouts of structured data. 
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In order to use sdb to its full capabilities, it is necessary to compile the 
source program with the — g option. This causes the compiler to generate 
additional information about the variables and statements of the compiled 
program. When the — g option has been specified, sdb can be used to obtain 
a trace of the called functions at the time of the abort and interactively 
display the values of variables. 

A typical sequence of shell commands for debugging a core image is 



cc — g prgm.c — o prgm 
prgm 

Bus error - core dunped 
sdb prgm 

inain:25: x[i] = 0; 



The program prgm was compiled with the -g option and then executed. 
An error occurred, which caused a core dump. The sdb program is then 
invoked to examine the core dump to determine the cause of the error. It 
reports that the bus error occurred in function main at line 25 (line 
numbers are always relative to the beginning of the file) and outputs the 
source text of the offending line. The sdb program then prompts the user 
with an which shows that it is waiting for a command. 

It is useful to know that sdb has a notion of current function and 
current line. In this example, they are initially set to main and 25, respec- 
tively. 

Here sdb was called with one argument, prgm. In general, it takes 
three arguments on the command line. The first is the name of the execut- 
able file that is to be debugged; it defaults to a.out when not specified. The 
second is the name of the core file, defaulting to core; and the third is the 
list of the directories (separated by colons) containing the source of the pro- 
gram being debugged. The default is the current working directory. In the 
example, the second and third arguments defaulted to the correct values, so 
only the first was specified. 



V 
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If the error occurred in a function that was not compiled with the — g 
option, sdb prints the function name and the address at which the error 
occurred. The current line and function are set to the first executable line 
in main. If main was not compiled with the — g option, sdb will print an 
error message, but debugging can continue for those routines that were 
compiled with the — g option. 

Figure 15-1 at the end of the chapter, shows a more extensive example 
of sdb use. 



Printing a Stack Trace 

It is often useful to obtain a listing of the function calls that led to the 
error. This is obtained with the t command. For example: 

♦t 

sub(x=2,y=3) [prgm.c:25] 
inter (i= 160 12) [p:gm.c:96] 

nain(argc=1,argv=Qx7fffff54,envp=0x7fffff5c) [prgm.c: 15] 

This indicates that the program was stopped within the function sub at line 
25 in file prgm.c. The sub function was called with the arguments x«2 and 
y«3 from inter at line 96. The inter function was called from main at line 
15. The main function is always called by a startup routine with three 
arguments often referred to as argc, argv, and envp. Note that argv and 
envp are pointers, so their values are printed in hexadecimal. 



Examining Variables 

The sdb program can be used to display variables in the stopped pro- 
gram. Variables are displayed by typing their name followed by a slash, so 

♦errflag/ 

causes sdb to display the value of variable errflag. Unless otherwise 
specified, variables are assumed to be either local to or accessible from the 
current function. To specify a different function, use the form 

♦subri/ 

to display variable i in function sub. FORTRAN 77 users can specify a com- 
mon block variable in the same manner, provided it is on the call stack. 
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The sdb program supports a limited form of pattern matching for vari- 
able and function names. The symbol • is used to match any sequence of 
characters of a variable name and ? to match any single character. Consider 
the following commands 

*x*/ 

*sub:y?/ 
**/ 

The first prints the values of all variables beginning with x, the second 
prints the values of all two letter variables in function sub beginning with 
y, and the last prints all variables. In the first and last examples, only vari- 
ables accessible from the current function are printed. The command 

**:*/ 

displays the variables for each function on the call stack. 

The sdb program normally displays the variable in a format determined 
by its type as declared in the source program. To request a different format, 
a specifier is placed after the slash. The specifier consists of an optional 
length specification followed by the format. The length specifiers are: 

b one byte 

h two bytes (half word) 

1 four bytes (long word) 

The length specifiers are effective only with the formats d, o, x, and u. If 
no length is specified, the word length of the host machine is used. A 
number can be used with the s or a formats to control the number of char- 
acters printed. The s and a formats normally print characters until either a 
null is reached or 128 characters have been printed. The number specifies 
exactly how many characters should be printed. 

There are a number of format specifiers available: 
c character 
d decimal 
u decimal unsigned 
o octal 
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X hexadecimal 

f 32-bit single-precision floating point 
g 64-bit double-precision floating point 

s Assume variable is a string pointer and print characters starting at the 
address pointed to by the variable until a null is reached. 

a Print characters starting at the variable's address until a null is 
reached. 

p Pointer to function, 

i Interpret as a machine-language instruction. 
For example, the variable i can be displayed with 
*i/x 

which prints out the value of i in hexadecimal. 

sdb also knows about structures, arrays, and pointers so that all of the 
following commands work. 

♦array[2][3]/ 
*Sfyra»id/ 
*psym->usage/ 
*xsym[ 20 ] . p->usage/ 

The only restriction is that array subscripts must be numbers. Note that as a 
special case: 

*psym[0] 

displays the structure pointed to by psym in decimal. 

Core locations can also be displayed by specifying their absolute 
addresses. The command 

*1024/ 

displays location 1024 in decimal. As in C language, numbers may also be 
specified in octal or hexadecimal so the above command is equivalent to 
both 

♦02000/ 

and 
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*0x400/ 

It is possible to mix numbers and variables so that 
♦1000.x/ 

refers to an element of a structure starting at address 1000, and 
*1000->x/ 

refers to an element of a structure whose address is at 1000. For commands 
of the type *1000.x/ and *1000->x/, the sdb program uses the structure 
template of the last structured referenced. 

The address of a variable is printed with =, so 

*i= 

displays the address of i. Another feature whose usefulness will become 
apparent later is the command 

*./ 

which redisplays the last variable typed. 

Source File Display and Manipulation 

The sdb program has been designed to make it easy to debug a program 
without constant reference to a current source listing. Facilities are pro- 
vided that perform context searches within the source files of the program 
being debugged and that display selected portions of the source files. The 
commands are similar to those of the UNIX system text editor ed(l). Like 
the editor, sdb has a notion of current file and line within the current file, 
sdb also knows how the lines of a file are partitioned into functions, so it 
also has a notion of current function. As noted in other parts of this docu- 
ment, the current function is used by a number of sdb commands. 

Displaying the Source File 

Four commands exist for displaying lines in the source file. They are 
useful for perusing the source program and for determining the context of 
the current line. The commands are: 
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p Prints the current line. 

w Window; prints a window of ten lines around the current 

line. 

z Prints ten lines starting at the current line. Advances the 

current line by ten. 

control-d Scrolls; prints the next ten lines and advances the current 
line by ten. This command is used to cleanly display long 
segments of the program. 

When a line from a file is printed, it is preceded by its line number. 
This not only gives an indication of its relative position in the file, but it is 
also used as input by some sdb commands. 

Changing the Current Source File or Function 

The e command is used to change the current source file. Either of the 
forms 

*e function 
*e file.c 

may be used. The first causes the file containing the named function to 
become the current file, and the current line becomes the first line of the 
function. The other form causes the named file to become current. In this 
case, the current line is set to the first line of the named file. Finally, an e 
command with no argument causes the current function and file named to 
be printed. 

Changing the Current Line in the Source File 

The z and control-d commands have a side effect of changing the 
current line in the source file. The following paragraphs describe other 
commands that change the current line. 

There are two commands for searching for instances of regular expres- 
sions in source files. They are 

♦/regular expression/ 
*?regular expression? 

The first command searches forward through the file for a line containing 
a string that matches the regular expression and the second searches 
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backwards. The trailing / and ? may be omitted from these commands. 
Regular expression matching is identical to that of ed(l). 

The H- and — commands may be used to move the current line forward 
or backward by a specified number of lines. Typing a new-line advances 
the current line by one, and typing a number causes that line to become the 
current line in the file. These commands may be combined with the display 
commands so that 

*+15z 

advances the current line by 15 and then prints ten lines. 



A Controlled Environment for Program Testing 

One very useful feature of sdb is breakpoint debugging. After entering 
sdb, breakpoints can be set at certain lines in the source program. The pro- 
gram is then started with an sdb command. Execution of the program 
proceeds as normal until it is about to execute one of the lines at which a 
breakpoint has been set. The program stops and sdb reports the breakpoint 
where the program stopped. Now, sdb commands may be used to display 
the trace of function calls and the values of variables. If the user is satisfied 
the program is working correctly to this point, some breakpoints can be 
deleted and others set; then program execution may be continued from the 
point where it stopped. 

A useful alternative to setting breakpoints is single stepping, sdb can 
be requested to execute the next line of the program and then stop. This 
feature is especially useful for testing new programs, so they can be verified 
on a statement-by-statement basis. If an attempt is made to single step 
through a function that has not been compiled with the — g option, execu- 
tion proceeds until a statement in a function compiled with the — g option is 
reached. It is also possible to have the program execute one machine level 
instruction at a time. This is particularly useful when the program has not 
been compiled with the — g option. 

Setting and Deleting Breaicpoints 

Breakpoints can be set at any line in a function compiled with the — g 
option. The command format is: 
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*12b 

♦piroc: 12b 

♦pcoc:b 

♦b 

The first form sets a breakpoint at line 12 in thie current file. The line 
numbers are relative to the beginning of the file as printed by the source 
file display commands. The second form sets a breakpoint at line 12 of 
function proc, and the third sets a breakpoint at the first line of proc. The 
last sets a breakpoint at the current line. 

Breakpoints are deleted similarly with the d command: 

♦12d 

♦pi:oc:12d 
*proc:d 

In addition, if the command d is given alone, the breakpoints are deleted 
interactively. Each breakpoint location is printed, and a line is read from 
the user. If the line begins with ay or d, the breakpoint is deleted. 

A list of the current breakpoints is printed in response to a B command, 
and the D command deletes all breakpoints. It is sometimes desirable to 
have sdb automatically perform a sequence of commands at a breakpoint 
and then have execution continue. This is achieved with another form of 
the b command. 

♦12b t;x/ 

causes both a trace back and the value of x to be printed each time execu- 
tion gets to line 12. The a command is a variation of the above command. 
There are two forms: 

♦pococia 
*proc:12a 

The first prints the function name and its arguments each time it is called, 
and the second prints the source line each time it is about to be executed. 
For both forms of the a command, execution continues after the function 
name or source line is printed. 
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Running the Program 

The r command is used to begin program execution. It restarts the pro- 
gram as if it were invoked from the shell. The command 

♦r args 

runs the program with the given arguments as if they had been typed on 
the shell command line. If no arguments are specified, then the arguments 
from the last execution of the program within sdb are used. To run a pro- 
gram with no arguments, use the R command. 

After the program is started, execution continues until a breakpoint is 
encountered, a signal such as INTERRUPT or QUIT occurs, or the program 
terminates. In all cases after an appropriate message is printed, control 
returns to the user. 

The c command may be used to continue execution of a stopped pro- 
gram. A line number may be specified, as in: 

*paax::12c 

This places a temporary breakpoint at the named line. The breakpoint is 
deleted when the c command finishes. There is also a C command that con- 
tinues but passes the signal that stopped the program back to the program. 
This is useful for testing user-written signal handlers. Execution may be 
continued at a specified line with the g command. For example: 

*17g 

continues at line 17 of the current function. A use for this command is to 
avoid executing a section of code that is known to be bad. The user should 
not attempt to continue execution in a function different than that of the 
breakpoint. 

The s command is used to run the program for a single statement. It is 
useful for slowly executing the program to examine its behavior in detail. 
An important alternative is the S command. This command is like the s 
command but does not stop within called functions. It is often used when 
one is confident that the called function works correctly but is interested in 
testing the calling routine. 
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The i command is used to run the program one machine level instruc- 
tion at a time while ignoring the signal that stopped the program. Its uses 
are similar to the s command. There is also an I command that causes the 
program to execute one machine level instruction at a time, but also passes 
the signal that stopped the program back to the program. 

Calling Functions 

It is possible to call any of the functions of the program from sdb. This 
feature is useful both for testing individual functions with different argu- 
ments and for calling a user-supplied function to print structured data. 
There are two ways to call a function: 

♦p(EOc(arg1, arg2, . . .) 
♦proc(arg1, arg2, . . .)/m 

The first simply executes the function. The second is intended for calling 
functions (it executes the function and prints the value that it returns). The 
value is printed in decimal unless some other format is specified by m. 
Arguments to functions may be integer, character or string constants, or 
variables that are accessible from the current function. 

An unfortunate bug in the current implementation is that if a function 
is called when the program is not stopped at a breakpoint (such as when a 
core image is being debugged) all variables are initialized before the func- 
tion is started. This makes it impossible to use a function that formats data 
from a dump. 



Machine Language Debugging 

The sdb program has facilities for examining programs at the machine 
language level. It is possible to print the machine language statements asso- 
ciated with a line in the source and to place breakpoints at arbitrary 
addresses. The sdb program can also be used to display or modify the con- 
tents of the machine registers. 

Displaying Machine Language Statements 

To display the machine language statements associated with line 25 in 
function main, use the command 

♦main: 25? 
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The ? command is identical to the / command except that it displays from 
text space. The default format for printing text space is the i format, which 
interprets the machine language instruction. The control-d command may 
be used to print the next ten instructions. 

Absolute addresses may be specified instead of line numbers by append- 
ing a : to them so that 

♦0x1024:? 

displays the contents of address 0x1024 in text space. Note that the com- 
mand 

*Qx1024? 

displays the instruction corresponding to line 0x1024 in the current func- 
tion. It is also possible to set or delete a breakpoint by specifying its abso- 
lute address; 

*0x1024:b 

sets a breakpoint at address 0x1024. 

Manipulating Registers 

The X command prints the values of all the registers. Also, individual 
registers may be named by appending a % sign to their name so that 

♦r3% 

displays the value of register r3. 



Other Commands 

To exit sdb, use the q command. 

The ! command (when used immediately after the * prompt) is identical 
to that in ed(l) and is used to have the shell execute a command. The ! can 
also be used to change the values of variables or registers v^hen the pro- 
gram is stopped at a breakpoint. This is done with the command 

♦variable lvalue 
*r3 lvalue 
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which sets the variable or the named register to the given value. The value 
may be a number, character constant, register, or the name of another vari- 
able. If the variable is of type float or double, the value can also be a 
floating-point constant (specified according to the standard C language for- 
mat). 



An sdb Session 

An example of a debugging session using sdb is shown in Figure 15-1. 
Comments (preceded by a pound sign, #) have been added to help you see 
what is happening. 



sdb inycjptiin - . : , ./cxranon # enter sdb cxxrenand 
Source path: . : . ./oonnon 
No cxDre image 

*vdiidow:b # set a breakpoint at start of wijidcw 

0x80802462 (vrLndow: 1459-t-2) b 
*r < m.s > Gut.m.s # run the program 

Brea3qx>int at 

0x80802462 in winder: 1459: window{size, fimc) register int size; 
boolean( ♦£ unc ) ( ) ; { 

♦t # print stack trace 

vriiidcw(size=2,func=w2c53t) [pptim.c: 1459] 
peepO [peep. c: 34] 

pseudo(s=.def"Irnain;"I.val"I.;"I.scri-1;*I.endef) [local. c:483] 
yylexO [local. c:229] 

main( argc=0 ,argv5:0xc0020 Ibc , - 1073610300 ) [optim. c : 227 ] 
*z # print 10 lines of source 

1459: w±ndow{size, func) register int size; boolean (*func)(); { 
1460: 
1461: 
1462: 
1463: 
1464: 
1465: 
1466: 
1467: 
1468: 

*s # step 

windcn^: 1459: vdjidc»/(size, func) register int size; boolean (♦func)(); { 
♦s # step 

window: 1465 : mACK( window) ; 
*s # step 
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extern NODE #ini1:w( ) ; 
register NODE ♦pi; 
register int i; 

'BRACE (windcw) ; 

/* find first window */ 
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continued 



wsize = size; 

# step 

if ((pi = initw(nO.forw)) == NULL) 

# step throd!^ pETOcedure call 
for (opf = pf->back; ; opf = pf->back) { 

# shew variable pi 



# print the register cxaitents 
rV r2/ 0xS088796c 
r4/ 0xc0020470 r5/ 0xc00203f0 
r7/ 0xS0886b38 r8/ 2 
fp/ 0xc0020308 sp/ 0xc0020308 
pc/ 0x808024b0 

M3VM (Jx80880d8c,96r0 [-0x7f77f274,%i:0] 

# dereference the pointer 



window: 1469: 
♦s 

winaow: 1470 : 
*S 

window: 1475: 
♦pi 

0x80886b38 
«x 

rO/ OxB0886b38 

r3/ 0x80885830 

r6/ OXC0020478 

ap/ Qxc00202dc 
psw/ 0x20 If 73 
Qx808024b0 (window: 1475) 
*pl[0] 

plCOl.foEw/ 0x80886b6c 
pl[0].back/ 0xS0886ac8 
pl[0].ops[03/ pushw 
pl[0].uniqid/ 
pl[0].op/ 123 
pl[0].nlive/ 3588 
pl[0].ndead/ 4096 
«pl->forw[0] 

pl->forw[0].farw/ Q3c80886ca0 
pl->forw[0].back/ 0x80886b38 
pl->forwC0].ops[03/ call 
pl->forw[0].umjqid/ 
pl->fO(rw[0],op/ 9 
pl->farw[0].nlive/ 3584 
pl->forw[0].ndead/ 4099 
*pl!pl->fOErw 
*pl 

Ox80886b6c 
*c 

Brea}cpolnt at 

0x80802462 in window: 1459: win3ow(size, func) register int size; 

boolean («funG)(); { 

♦s # step 

window: 1459: window(size, func) register int size; boolean (♦funcX); { 
♦s # step 

window: 1465 : 1I1ACe:( window) ; 

♦size # shew function argusoent size 

3 



# dereference the pointer 



# replace pi with pl->forw 

# shew pi 

# continue 
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*D # delete all breaJqpoints 

All breaJcpoints deleted 

*c # continue 

Process terminated 

*q # quit sdb 

$ 




Figure 15-1: Example of sdb Usage 
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Introduction 



The lint program examines C language source programs detecting a 
number of bugs and obscurities. It enforces the type rules of C language 
more strictly than the C compiler. It may also be used to enforce a number 
of portability restrictions involved in moving programs between different 
machines and /or operating systems. Another option detects a number of 
wasteful or error prone constructions, which nevertheless are legal. lint 
accepts multiple input files and library specifications and checks them for 
consistency. 
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Usage 

The lint command has the form: 

lint [options] files ... library-descriptors ... 

where options are optional flags to control lint checking and messages; files 
are the files to be checked which end with .c or .In; and library-descriptors 
are the names of libraries to be used in checking the program. 

The options that are currently supported by the lint command are: 

—a Suppress messages about assignments of long values to vari- 

ables that are not long. 

— b Suppress messages about break statements that cannot be 

reached. 

— c Only check for intra-file bugs; leave external information in 

files suffixed with .In. 

— h Do not apply heuristics (which attempt to detect bugs, 

improve style, and reduce waste). 

— n Do not check for compatibility with either the standard or the 

portable lint library. 

— o name Create a lint library from input files named llib— l«flme.ln. 

— p Attempt to check portability. 

— u Suppress messages about function and external variables used 

and not defined or defined and not used. 

— v Suppress messages about unused arguments in functions. 

—X Do not report variables referred to by external declarations 

but never used. 

When more than one option is used, they should be combined into a single 
argument, such as — ab or — xha. 

The names of files that contain C language programs should end with 
the suffix .c, which is mandatory for lint and the C compiler. 
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Usage 



lint accepts certain arguments, such as: 
-Im 

These arguments specify libraries that contain functions used in the C 
language program. The source code is tested for compatibility with these 
libraries. This is done by accessing library description files whose names 
are constructed from the library arguments. These files all begin with the 
comment: 

which is followed by a series of dummy function definitions. The critical 
parts of these definitions are the declaration of the function return type, 
whether the dummy function returns a value, and the number and types of 
arguments to the function. The VARARGS and ARGSUSED comments can 
be used to specify features of the library functions. The next section, "lint 
Message Types," describes how it is done. 

lint library files are processed almost exactly like ordinary source files. 
The only difference is that functions which are defined in a library file but 
are not used in a source file do not result in messages, lint does not simu- 
late a full library search algorithm and will print messages if the source files 
contain a redefinition of a library routine. 

By default, lint checks the programs it is given against a standard 
library file that contains descriptions of the programs that are normally 
loaded when a C language program is run. When the — p option is used, 
another file is checked containing descriptions of the standard library rou- 
tines which are expected to be portable across various machines. The — n 
option can be used to suppress all library checking. 



lint 16-3 



lint Message Types 

The following paragraphs describe the major categories of messages 
printed by lint. 



Unused Variables and Functions 

As sets of programs evolve and develop, previously used variables and 
arguments to functions may become unused. It is not uncommon for exter- 
nal variables or even entire functions to become unnecessary and yet not be 
removed from the source. These types of errors rarely cause working pro- 
grams to fail, but are a source of inefficiency and make programs harder to 
understand and change. Also, information about such unused variables and 
functions can occasionally serve to discover bugs. 

lint prints messages about variables and functions which are defined 
but not otherwise mentioned, unless the message is suppressed by means of 
the — u or — x option. 

Certain styles of programming may permit a function to be written with 
an interface where some of the function's arguments are optional. Such a 
function can be designed to accomplish a variety of tasks depending on 
which arguments are used. Normally lint prints messages about unused 
arguments; however, the — v option is available to suppress the printing of 
these messages. When — v is in effect, no messages are produced about 
unused arguments except for those arguments which are unused and also 
declared as register arguments. This can be considered an active (and 
preventable) waste of the register resources of the machine. 

Messages about unused arguments can be suppressed for one function 
by adding the comment: 

/* AR3SUSED */ 

to the source code before the function. This has the effect of the — v option 
for only one function. Also, the comment: 

/* VARARGS */ 

can be used to suppress messages about variable number of arguments in 
calls to a function. The comment should be added before the function 
definition. In some cases, it is desirable to check the first several arguments 
and leave the later arguments unchecked. This can be done with a digit 
giving the number of arguments which should be checked. For example: 
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/* VARARGS2 



will cause only the first two arguments to be checked. 

When lint is applied to some but not all files out of a collection that are 
to be loaded together, it issues complaints about unused or undefined vari- 
ables. This information is, of course, more distracting than helpful. Func- 
tions and variables that are defined may not be used; conversely, functions 
and variables defined elsewhere may be used. The — u option suppresses 
the spurious messages. 



Set/Used Information 

lint attempts to detect cases where a variable is used before it is set. 
lint detects local variables (automatic and register storage classes) whose 
first use appears physically earlier in the input file than the first assignment 
to the variable. It assumes that taking the address of a variable constitutes a 
"use" since the actual use may occur at any later time, in a data dependent 
fashion. 

The restriction to the physical appearance of variables in the file makes 
the algorithm very simple and quick to implement since the true flow of 
control need not be discovered. It does mean that lint can print error mes- 
sages about program fragments that are legal, but these programs would 
probably be considered bad on stylistic grounds. Because static and external 
variables are initialized to zero, no meaningful information can be 
discovered about their uses. The lint program does deal with initialized 
automatic variables. 

The set/ used information also permits recognition of those local vari- 
ables that are set and never used. These form a frequent source of 
inefficiencies and may also be symptomatic of bugs. 



Flow of Control 

lint attempts to detect unreachable portions of a program. It will print 
messages about unlabeled statements immediately following goto, break, 
continue, or return statements. It attempts to detect loops that cannot be 
left at the bottom and to recognize the special cases while(l) and for(;;) as 
infinite loops, lint also prints messages about loops that cannot be entered 
at the top. Valid programs may have such loops, but they are considered to 
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be bad style. If you do not want messages about unreached portions of the 
program, use the — b option. 

lint has no way of detecting functions that are called and never return. 
Thus, a call to exit may cause unreachable code which lint does not detect. 
The most serious effects of this are in the determination of returned func- 
tion values (see "Function Values"). If a particular place in the program is 
thought to be unreachable in a way that is not apparent to lint, the com- 
ment 

/* NOEREACHED */ 

can be added to the source code at the appropriate place. This comment 
will inform lint that a portion of the program cannot be reached, and lint 
will not print a message about the unreachable portion. 

Programs generated by yacc and especially lex may have hundreds of 
unreachable break statements, but messages about them are of little impor- 
tance. There is typically nothing the user can do about them, and the 
resulting messages would clutter up the lint output. The recommendation 
is to invoke lint with the — b option when dealing with such input. 

Function Values 

Sometimes functions return values that are never used. Sometimes pro- 
grams incorrectly use function values that have never been returned, lint 
addresses this problem in a number of ways. 

Locally, within a function definition, the appearance of both 

retum( expr ); 

and 

return ; 

statements is cause for alarm; lint will give the message 

function name has ret:um(e) and return 

The most serious difficulty with this is detecting when a function return is 
implied by flow of control reaching the end of the function. This can be 
seen with a simple example: 
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f ( a ) { 

if ( a ) return ( 3 ); 

g (); 

} 

Notice that, if a tests false, f will call g and then return with no defined 
return value; this will trigger a message from lint. If g, like exit, never 
returns, the message will still be produced when in fact nothing is wrong. 
A comment 

/♦NOflREACHED*/ 

in the source code will cause the message to be suppressed. In practice, 
some potentially serious bugs have been discovered by this feature. 

On a global scale, lint detects cases where a function returns a value 
that is sometimes or never used. When the value is never used, it may con- 
stitute an inefficiency in the function definition that can be overcome by 
specifying the function as being of type (void). For example: 

(void) fprintf (s1xlerr,"File busy. Try again later!\n"); 

When the value is sometimes unused, it may represent bad style (e.g., not 
testing for error conditions). 

The opposite problem, using a function value when the function does 
not return one, is also detected. This is a serious problem. 



Type Checking 

lint enforces the type checking rules of C language more strictly than 
the compilers do. The additional checking is in four major areas: 

■ across certain binary operators and implied assignments 

■ at the structure selection operators 

■ between the definition and uses of functions 

■ in the use of enumerations 
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There are a number of operators which have an implied balancing 
between types of the operands. The assignment, conditional ( ?: ), and rela- 
tional operators have this property. The argument of a return statement and 
expressions used in initialization suffer similar conversions. In these opera- 
tions, char, short, int, long, unsigned, float, and double types may be freely 
intermixed. The types of pointers must agree exactly except that arrays of 
xs can, of course, be intermixed with pointers to xs. 

The type checking rules also require that, in structure references, the 
left operand of the -> be a pointer to structure, the left operand of the . 
be a structure, and the right operand of these operators be a member of the 
structure implied by the left operand. Similar checking is done for refer- 
ences to unions. 

Strict rules apply to function argument and return value matching. The 
types float and double may be freely matched, as may the types char, short, 
int, and unsigned. Also, pointers can be matched with the associated 
arrays. Aside from this, all actual arguments must agree in type with their 
declared counterparts. 

With enumerations, checks are made that enumeration variables or 
members are not mixed with other types or other enumerations and that the 
only operations applied are =, initialization, ==, !=, and function arguments 
and return values. 

If it is desired to turn off strict type checking for an expression, the 
comment 

/* NOSTKECT */ 

should be added to the source code immediately before the expression. This 
comment will prevent strict type checking for only the next line in the pro- 
gram. 



Type Casts 

The type cast feature in C language was introduced largely as an aid to 
producing more portable programs. Consider the assignment 

P = 1 ; 

where p is a character pointer, lint will print a message as a result of 
detecting this. Consider the assignment 
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p = (diar *)1 ; 

in which a cast has been used to convert the integer to a character pointer. 
The programmer obviously had a strong motivation for doing this and has 
clearly signaled his intentions. Nevertheless, lint will continue to print 
messages about this. 

Nonportable Character Use 

On some systems, characters are signed quantities with a range from 
— 128 to 127. On other C language implementations, characters take on 
only positive values. Thus, lint will print messages about certain comparis- 
ons and assignments as being illegal or nonportable. For example, the frag- 
ment 

char c; 

if( (c = getchar( )) < ) . . . 

will work on one machine but will fail on machines where characters 
always take on positive values. The real solution is to declare c as an 
integer since getchar is actually returning integer values. In any case, lint 

nonportable character ocrnparison 

A similar issue arises with bit fields. When assignments of constant 
values are made to bit fields, the field may be too small to hold the value. 
This is especially true because on some machines bit fields are considered as 
signed quantities. While it may seem logical to consider that a two-bit field 
declared of type int cannot hold the value 3, the problem disappears if the 
bit field is declared to have type unsigned 

Assignments of longs to ints 

Bugs may arise from the assignment of long to an int, which will trun- 
cate the contents. This may happen in programs which have been incom- 
pletely converted to use typedefs. When a typedef variable is changed 
from int to long, the program can stop working because some intermediate 
results may be assigned to ints, which are truncated. The —a option can be 
used to suppress messages about the assignment of longs to ints. 
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Strange Constructions 

Several perfectly legal, but somewhat strange, constructions are detected 
by lint. The messages hopefully encourage better code quality, clearer style, 
and may even point out bugs. The — h option is used to suppress these 
checks. For example, in the statement 

*p++ ; 

the ♦ does nothing. This provokes the message 

noil effect 

from lint. The following program fragment: 

xmsigned x ; 
if ( X < ) ... 

results in a test that will never succeed. Similarly, the test 

if { X > ) . , . 
is equivalent to 

if ( X 1= ) 

which may not be the intended action, lint will print the message 

degenerate unsigned cxxnparison 
in these cases. If a program contains something similar to 

if ( 1 1= ) ... 
lint will print the message 

cxanstant in oosnditicEnal context 

since the comparison of 1 with gives a constant result. 

Another construction detected by lint involves operator precedence. 
Bugs which arise from misunderstandings about the precedence of operators 
can be accentuated by spacing and formatting, making such bugs extremely 
hard to find. For example, the statements 

if ( xfi.077 == ) . . . 

and 
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x«2 + 40 



probably do not do what was intended. The best solution is to parenthesize 
such expressions, and lint encourages this by an appropriate message. 

Old Syntax 

Several forms of older syntax are now illegal. These fall into two 
classes: assignment operators and initialization. 

The older forms of assignment operators (e.g., =+, =— , ••.) could cause 
ambiguous expressions, such as: 

a —1 ; 

which could be taken as either 
a ^ 1 ; 

or 

a = -1 ; 

The situation is especially perplexing if this kind of ambiguity arises as the 
result of a macro substitution. The newer and preferred operators (e.g., +=, 
-=, ...) have no such ambiguities. To encourage the abandonment of the 
older forms, lint prints messages about these old-fashioned operators. 

A similar issue arises with initialization. The older language allowed 

int X 1 ; 

to initialize x to 1. This also caused syntactic difficulties. For example, the 
initialization 

int X ( — 1 ) ; 

looks somewhat like the beginning of a function definition: 

int X ( y ) { , . . 

and the compiler must read past x in order to determine the correct mean- 
ing. Again, the problem is even more perplexing when the initializer 
involves a macro. The current syntax places an equals sign between the 
variable and the initializer: 

int X = — 1 ; 

This is free of any possible syntactic ambiguity. 
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Pointer Alignment 

Certain pointer assignments may be reasonable on some machines and 
illegal on others due entirely to alignment restrictions, lint tries to detect 
cases where pointers are assigned to other pointers and such alignment 
problems might arise. The message 

possible pointer alignment problem 

results from this situation. 



Multiple Uses and Side Effects 

In complicated expressions, the best order in which to evaluate subex- 
pressions may be highly machine dependent. For example, on machines in 
which the stack runs backwards, function arguments will probably be best 
evaluated from right to left. On machines with a stack running forward, left 
to right seems most attractive. Function calls embedded as arguments of 
other functions may or may not be treated similarly to ordinary arguments. 
Similar issues arise with other operators that have side effects, such as the 
assignment operators and the increment and decrement operators. 

In order that the efficiency of C language on a particular machine not be 
unduly compromised, the C language leaves the order of evaluation of com- 
plicated expressions up to the local compiler. In fact, the various C com- 
pilers have considerable differences in the order in which they will evaluate 
complicated expressions. In particular, if any variable is changed by a side 
effect and also used elsewhere in the same expression, the result is explicitly 
undefined. 

lint checks for the important special case where a simple scalar variable 
is affected. For example, the statement 

a[i] = b[i++]; 
will cause lint to print the message 

warning: i evaluation order imdefined 
in order to call attention to this condition. 
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Introduction 

This chapter contains a summary of the grammar and syntax rules of the 
C Programming Language. The implementation described is that found on 
the AT&T line of 3B Computers, A consistent attempt is made to point out 
where other implementations may differ. 
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Lexical Conventions 



There are six classes of tokens: identifiers, keywords, constants, string 
literals, operators, and other separators. Blanks, tabs, new-lines, and com- 
ments (collectively, "white space") as described below are ignored except as 
they serve to separate tokens. Some white space is required to separate oth- 
erwise adjacent identifiers, keywords, and constants. 

If the input stream has been parsed into tokens up to a given character, 
the next token is taken to include the longest string of characters that could 
possibly constitute a token. 



Comments 

The characters /♦ introduce a comment that terminates with the charac- 
ters */. Comments do not nest. 



Identifiers (Names) 

An identifier is a sequence of letters and digits. The first character must 
be a letter. The underscore (_) counts as a letter. Uppercase and lowercase 
letters are different. There is no limit on the length of a name. Other 
implementations may collapse case distinctions for external names, and may 
reduce the number of significant characters for both external and non- 
external names. 



Keywords 

The following identifiers are reserved for use as keywords and may not 
be used otherwise: 



asm 


default 


float 


register 


switch 


auto 


do 


for 


return 


typedef 


break 


double 


goto 


short 


union 


case 


else 


if 


sizeof 


unsigned 


char 


enum 


int 


static 


void 


continue 


external 


long 


struct 


while 



Some implementations also reserve the word fortran. 
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Constants 

There are several kinds of constants. Each has a type; an introduction to 
types is given in "Storage Class and Type." 

Integer Constants 

An integer constant consisting of a sequence of digits is taken to be 
octal if it begins with (digit zero). An octal constant consists of the digits 
through 7 only. A sequence of digits preceded by Ox or OX (digit zero) is 
taken to be a hexadecimal integer. The hexadecimal digits include a or A 
through f or F with values 10 through 15. Otherwise, the integer constant 
is taken to be decimal. A decimal constant whose value exceeds the largest 
signed machine integer is taken to be long; an octal or hex constant that 
exceeds the largest unsigned machine integer is likewise taken to be long. 
Otherwise, integer constants are int. 

Explicit Long Constants 

A decimal, octal, or hexadecimal integer constant immediately followed 
by 1 (letter ell) or L is a long constant. As discussed below, on AT&T 3B 
Computers integer and long values may be considered identical. 

Character Constants 

A character constant is a character enclosed in single quotes, as in 'x'. 
The value of a character constant is the numerical value of the character in 
the machine's character set. Certain nongraphic characters, the single quote 
(') and the backslash (\), may be represented according to the table of escape 
sequences shown in Figure 17-1: 
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new-line NL (LF) 
harizontcLL tab 
vertical tab 
backspaceBS 
carriage return 
farm f eedFF 
):>aG3cslash\ 
single quote 
bit pattern 



\n 
HT 

vr 

\b 
CR 
\f 
W 



ddd 



\v 



\ddd 





Figure 17-1: Escape Sequences for Nongraphic Characters 



The escape \ddd consists of the backslash followed by 1, 2, or 3 octal 
digits that are taken to specify the value of the desired character. A special 
case of this construction is \0 (not followed by a digit), which indicates the 
ASCII character NUL. If the character following a backslash is not one of 
those specified, the behavior is undefined. An explicit new-line character is 
illegal in a character constant. The type of a character constant is int. 

Floating Constants 

A floating constant consists of an integer part, a decimal point, a frac- 
tion part, an e or E, and an optionally signed integer exponent. The integer 
and fraction parts both consist of a sequence of digits. Either the integer 
part or the fraction part (not both) may be missing. Either the decimal 
point or the e and the exponent (not both) may be missing. Every floating 
constant has type double. 

Enumeration Constants 

Names declared as enumerators (see "Structure, Union, and Enumeration 
Declarations" under "Declarations") have type int. 
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Strong Literals 

A string literal is a sequence of characters surrounded by double quotes, 
as in A string literal has type "array of char" and storage class static 
(see "Storage Class and Type") and is initialized with the given characters. 
The compiler places a null byte (\0) at the end of each string literal so that 
programs that scan the string literal can find its end. In a string literal, the 
double quote character (") must be preceded by a \; in addition, the same 
escapes as described for character constants may be used. 

A \ and the immediately following new-line are ignored. All string 
literals, even when written identically, are distinct. 



Syntax Notation 

Syntactic categories are indicated by italic type and literal words and 
characters by bold type. Alternative categories are listed on separate lines. 
An optional entry is indicated by the subscript "opt," so that 

( expression ^pf. } 

indicates an optional expression enclosed in braces. The syntax is summar- 
ized in "Syntax Summary" at the end of the chapter. 
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The C language bases the interpretation of an identifier upon two attri- 
butes of the identifier: its storage class and its type. The storage class 
determines the location and lifetime of the storage associated with an 
identifier; the type determines the meaning of the values found in the 
identifier's storage. 



Storage Class 

There are four declarable storage classes: 

■ automatic 

■ static 

■ external 

■ register 

Automatic variables are local to each invocation of a block (see "Compound 
Statement or Block" in "Statements") and are discarded upon exit from the 
block. Static variables are local to a block but retain their values upon reen- 
try to a block even after control has left the block. External variables exist 
and retain their values throughout the execution of the entire program and 
may be used for communication between functions, even separately com- 
piled functions. Register variables are (if possible) stored in the fast regis- 
ters of the machine; like automatic variables, they are local to each block 
and disappear on exit from the block. 



Type 

The C language supports several fundamental types of objects. Objects 
declared as characters (char) are large enough to store any member of the 
implementation's character set. If a genuine character from that character 
set is stored in a char variable, its value is equivalent to the integer code for 
that character. Other quantities may be stored into character variables, but 
the implementation is machine dependent. In particular, char may be 
signed or unsigned by default. In this implementation the default is 
unsigned. 
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Up to three sizes of integer, declared short int, int, and long int, are 
available. Longer integers provide no less storage than shorter ones, but 
the implementation may make either short integers or long integers, or 
both, equivalent to plain integers. Plain integers have the natural size sug- 
gested by the host machine architecture. The other sizes are provided to 
meet special needs. The sizes for the AT&T 3B Computer are shown in Fig- 
ure 17-2. 



AT&T 3B Computer 
(ASCII) 



char 


8 bits 


int 


32 


short 


16 


long 


32 


float 


32 


double 


64 


float range 


±10 


double range 


±10 



Figure 17-2: AT&T 3B Computer Hardware Characteristics 



The properties of enum types (see "Structure, Union, and Enumeration 
Declarations" under "Declarations") are identical to those of some integer 
types. The implementation may use the range of values to determine how 
to allot storage. 

Unsized integers, declared unsigned, obey the laws of arithmetic 
modulo 2 where n is the number of bits in the representation. 

Single-precision floating point (float) and double precision floating 
point (double) may be synonymous in some implementations. 

Because objects of the foregoing types can usefully be interpreted as 
numbers, they will be referred to as arithmetic types. Char, int of all sizes 
whether unsigned or not, and enum will collectively be called integral 
types. The float and double types will collectively be called floating types. 
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The void type specifies an empty set of values. It is used as the type 
returned by functions that generate no value. 

Besides the fundamental arithmetic types, there is a conceptually 
infinite class of derived types constructed from the fundamental types in 
the following ways: 

■ arrays of objects of most types 

■ functions that return objects of a given type 

■ pointers to objects of a given type 

■ structures containing a sequence of objects of various types 

■ unions capable of containing any one of several objects of various 
types 

In general these methods of constructing objects can be applied recursively. 

Objects and lvalues 

An object is a manipulatable region of storage. An lvalue is an expres- 
sion referring to an object. An obvious example of an lvalue expression is 
an identifier. There are operators that yield lvalues: for example, if E is an 
expression of pointer type, then •£ is an lvalue expression referring to the 
object to which E points. The name "lvalue" comes from the assignment 
expression El = E2 in which the left operand El must be an lvalue expres- 
sion. The discussion of each operator below indicates whether it expects 
lvalue operands and whether it yields an lvalue. 



17-8 PROGRAMMER'S GUIDE 



Operator Conversions 



A number of operators may, depending on their operands, cause conver- 
sion of the value of an operand from one type to another. This part 
explains the result to be expected from such conversions. The conversions 
demanded by most ordinary operators are summarized under "Arithmetic 
Conversions." The summary will be supplemented as required by the dis- 
cussion of each operator. 



Characters and Integers 

A character or a short integer may be used wherever an integer may be 
used. In all cases the value is converted to an integer. Conversion of a shorter 
integer to a longer integer preserves sign. On the 3B Computers sign exten- 
sion of char variables does not occur. It is guaranteed that a member of the 
standard character set is non-negative. 

On machines that treat characters as signed, the characters of the ASCII 
set are all non-negative. However, a character constant specified with an 
octal escape suffers sign extension and may appear negative; for example, 
'\377' has the value —1. 

When a longer integer is converted to a shorter integer or to a char, it is 
truncated on the left. Excess bits are simply discarded. 



Float and Double 

Ail floating arithmetic in C is carried out in double precision. When- 
ever a float appears in an expression it is lengthened to double by zero pad- 
ding its fraction. When a double must be converted to float, for example 
by an assignment, the double is rounded before truncation to float length. 
This result is undefined if it cannot be represented as a float. 

Floating and Integral 

Conversions of floating values to integral type are rather machine 
dependent. In particular, the direction of truncation of negative numbers 
varies. The result is undefined if it will not fit in the space provided. 



C LANGUAGE 17-9 



Operator Conversions 



Conversions of integral values to floating type behave well. Some loss 
of accuracy occurs if the destination lacks sufficient bits. 



Pointers and Integers 

An expression of integral type may be added to or subtracted from a 
pointer; in such a case, the first is converted as specified in the discussion of 
the addition operator. Two pointers to objects of the same type may be sub- 
tracted; in this case, the result is converted to an integer as specified in the 
discussion of the subtraction operator. 



Unsigned 

Whenever an unsigned integer and a plain integer are combined, the 
plain integer is converted to unsigned and the result is unsigned. The 
value is the least unsigned integer congruent to the signed integer (modulo 
2wordsize) ^ 2's complement representation, this conversion is concep- 
tual; and there is no actual change in the bit pattern. 

When an unsigned short integer is converted to long, the value of the 
result is the same numerically as that of the unsigned integer. Thus, the 
conversion amounts to padding with zeros on the left. 



Arithmetic Conversions 

A great many operators cause conversions and yield result types in a 
similar way. This pattern will be called the "usual arithmetic conversions." 

1 . First, any operands of type char or short are converted to int, and 
any operands of type unsigned char or unsigned short are con- 
verted to unsigned int. 

2. Then, if either operand is double, the other is converted to double 
and that is the type of the result. 

3. Otherwise, if either operand is unsigned long, the other is con- 
verted to unsigned long and that is the type of the result. 

4. Otherwise, if either operand is long, the other is converted to long 
and that is the type of the result. 
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5. Otherwise, if one operand is long, and the other is unsigned int, 
they are both converted to unsigned long and that is the type of the 
result, 

6. Otherwise, if either operand is unsigned, the other is converted to 
unsigned and that is the type of the result. 

7. Otherwise, both operands must be int, and that is the type of the 
result. 



Void 

The (nonexistent) value of a void object may not be used in any way, 
and neither explicit nor implicit conversion may be applied. Because a void 
expression denotes a nonexistent value, such an expression may be used 
only as an expression statement (see "Expression Statement" under "State- 
ments") or as the left operand of a comma expression (see "Comma Opera- 
tor" under "Expressions"). 

An expression may be converted to type void by use of a cast. For 
example, this makes explicit the discarding of the value of a function call 
used as an expression statement. 



C LANGUAGE 17-11 



Expressions and Operators 



The precedence of expression operators is the same as the order of the 
major subsections of this section, highest precedence first. Thus, for exam- 
ple, the expressions referred to as the operands of + (see "Additive Opera- 
tors") are those expressions defined under "Primary Expressions", "Unary 
Operators", and "Multiplicative Operators". Within each subpart, the opera- 
tors have the same precedence. Left- or right-associativity is specified in 
each subsection for the operators discussed therein. The precedence and 
associativity of all the expression operators are summarized in the grammar 
of "Syntax Summary". 

Otherv/ise, the order of evaluation of expressions is undefined. In par- 
ticular, the compiler considers itself free to compute subexpressions in the 
order it believes most efficient even if the subexpressions involve side 
effects. Expressions involving a commutative and associative operator +, 
&, I , may be rearranged arbitrarily even in the presence of parentheses; 
to force a particular order of evaluation, an explicit temporary must be used. 

The handling of overflow and divide check in expression evaluation is 
undefined. Most existing implementations of C ignore integer overflows; 
treatment of division by and all floating-point exceptions varies between 
machines and is usually adjustable by a library function. 



Primary Expressions 

Primary expressions involving ., ->/ subscripting, and function calls 
group left to right. 

primary-expression: 
identifier 
constant 
string literal 
( expression ) 

primary-expression I expression ] 
primary-expression ( expression-lisi^^f^ ) 
primary-expression . identifier 
primary-expression -> identifier 
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expression-list: 
expression 

expression-list , expression 

An identifier is a primary expression provided it has been suitably declared 
as discussed below. Its type is specified by its declaration. If the type of 
the identifier is "array of , . then the value of the identifier expression is a 
pointer to the first object in the array; and the type of the expression is 
"pointer to . . Moreover, an array identifier is not an lvalue expression. 
Likewise, an identifier that is declared "function returning . . when used 
except in the function-name position of a call, is converted to "p^i^^^^r to 
function returning . . 

A constant is a primary expression. Its type may be int, long, or double 
depending on its form. Character constants have type int and floating con- 
stants have type double. 

A string literal is a primary expression. Its type is originally "array of 
char", but following the same rule given above for identifiers, this is 
modified to "pointer to char" and the result is a pointer to the first character 
in the string literal. (There is an exception in certain initializers; see "Ini- 
tialization" under "Declarations.") 

A parenthesized expression is a primary expression whose type and 
value are identical to those of the unadorned expression. The presence of 
parentheses does not affect whether the expression is an lvalue. 

A primary expression followed by an expression in square brackets is a 
primary expression. The intuitive meaning is that of a subscript. Usually, 
the primary expression has type "pointer to . . .", the subscript expression is 
int, and the type of the result is "... ". The expression E1[E2] is identical 
(by definition) to •((E1)+(E2)). All the clues needed to understand this 
notation are contained in this subpart together with the discussions in 
"Unary Operators" and "Additive Operators" on identifiers, ♦ and respec- 
tively. The implications are summarized under "Arrays, Pointers, and Sub- 
scripting" under 'Types Revisited." 
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A function call is a primary expression followed by parentheses contain- 
ing a possibly empty, comma-separated list of expressions that constitute the 
actual arguments to the function. The primary expression must be of type 
"function returning . . and the result of the function call is of type "... ". 
As indicated below, a hitherto unseen identifier followed immediately by a 
left parenthesis is contextually declared to represent a function returning an 
integer. 

Any actual arguments of type float are converted to double before the 
call. Any of type char or short are converted to int. Array names are con- 
verted to pointers. No other conversions are performed automatically; in 
particular, the compiler does not compare the types of actual arguments 
with those of formal arguments. If conversion is needed, use a cast; see 
"Unary Operators" and "Type Names" under "Declarations," 

In preparing for the call to a function, a copy is made of each actual 
parameter. Thus, all argument passing in C is strictly by value. A function 
may change the values of its formal parameters, but these changes cannot 
affect the values of the actual parameters. It is possible to pass a pointer on 
the understanding that the function may change the value of the object to 
which the pointer points. An array name is a pointer expression. The 
order of evaluation of arguments is undefined by the language; take note 
that the various compilers differ. Recursive calls to any function are permit- 
ted. 

A primary expression followed by a dot followed by an identifier is an 
expression. The first expression must be a structure or a union, and the 
identifier must name a member of the structure or union. The value is the 
named member of the structure or union, and it is an lvalue if the first 
expression is an lvalue. 

A primary expression followed by an arrow (built from - and >) fol- 
lowed by an identifier is an expression. The first expression must be a 
pointer to a structure or a union and the identifier must name a member of 
that structure or union. The result is an lvalue referring to the named 
member of the structure or union to which the pointer expression points. 
Thus the expression El— >MOS is the same as (♦ED.MOS. Structures and 
unions are discussed in "Structure, Union, and Enumeration Declarations" 
under "Declarations." 
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Unary Operators 

Expressions with unary operators group right to left. 

unary-expression: 
* expression 
& lvalue 
— expression 
! expression 
^ expression 
H-H- lvalue 
— lvalue 
lvalue ++ 
lvalue — 

( type-name ) expression 
sizeof expression 
sizeof ( type-name ) 

The unary ♦ operator means "indirection"; the expression must be a pointer, 
and the result is an lvalue referring to the object to which the expression 
points. If the type of the expression is "pointer to . . the type of the result 

is 

The result of the unary & operator is a pointer to the object referred to 
by the lvalue. If the type of the lvalue is " . . . the type of the result is 
"pointer to . . 

The result of the unary — operator is the negative of its operand. The 
usual arithmetic conversions are performed. The negative of an unsigned 
quantity is computed by subtracting its value from 2 where n is the 
number of bits in the corresponding signed type. 

There is no unary H- operator. 

The result of the logical negation operator ! is one if the value of its 
operand is zero, zero if the value of its operand is nonzero. The type of the 
result is int. It is applicable to any arithmetic type or to pointers. 

The * operator yields the one's complement of its operand. The usual 
arithmetic conversions are performed. The type of the operand must be 
integral. 
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The object referred to by the lvalue operand of prefix ++ is incre- 
mented. The value is the new value of the operand but is not an lvalue. 
The expression ++x is equivalent to x += 1. See the discussions "Additive 
Operators" and "Assignment Operators" for information on conversions. 

The lvalue operand of prefix — is decremented analogously to the 
prefix ++ operator. 

When postfix ++ is applied to an lvalue, the result is the value of the 
object referred to by the lvalue. After the result is noted, the object is incre- 
mented in the same manner as for the prefix ++ operator. The type of the 
result is the same as the type of the lvalue expression. 

When postfix — is applied to an lvalue, the result is the value of the 
object referred to by the lvalue. After the result is noted, the object is 
decremented in the manner as for the prefix — operator. The type of the 
result is the same as the type of the lvalue expression. 

An expression preceded by the parenthesized name of a data type causes 
conversion of the value of the expression to the named type. This construc- 
tion is called a cast. Type names are described in "Type Names" under 
"Declarations." 

The sizeof operator yields the size in bytes of its operand. (A byte is 
undefined by the language except in terms of the value of sizeof. However, 
in all existing implementations, a byte is the space required to hold a char.) 
When applied to an array, the result is the total number of bytes in the 
array. The size is determined from the declarations of the objects in the 
expression. This expression is semantically an unsigned constant and may 
be used anywhere a constant is required. Its major use is in communication 
with routines like storage allocators and I/O systems. 

The sizeof operator may also be applied to a parenthesized type name. 
In that case it yields the size in bytes of an object of the indicated type. 

The construction sizeofUype ) is taken to be a unit, so the expression 
sizeo£( type )-2 is the same as (sizeof (type ))-2. 



Multiplicative Operators 

The multiplicative operators /, and % group left to right. The usual 
arithmetic conversions are performed. 



17-16 PROGRAMMER'S GUIDE 



Expressions and Operators 



multiplicative expression: 

expression * expression 
expression I expression 
expression % expression 

The binary * operator indicates multiplication. The * operator is associative, 
and expressions with several multiplications at the same level may be rear- 
ranged by the compiler. The binary / operator indicates division. 

The binary % operator yields the remainder from the division of the 
first expression by the second. The operands must be integral. 

When positive integers are divided, truncation is toward 0; but the form 
of truncation is machine-dependent if either operand is negative. On all 
machines covered by this manual, the remainder has the same sign as the 
dividend. It is always true that (a/b)*b + a%b is equal to a (if b is not 0). 

Additive Operators 

The additive operators + and — group left to right. The usual arith- 
metic conversions are performed. There are some additional type possibili- 
ties for each operator. 

additive-expression: 

expression H- expression 
expression — expression 

The result of the 4- operator is the sum of the operands. A pointer to an 
object in an array and a value of any integral type may be added. The latter 
is in all cases converted to an address offset by multiplying it by the length 
of the object to which the pointer points. The result is a pointer of the 
same type as the original pointer that points to another object in the same 
array, appropriately offset from the original object. Thus if P is a pointer to 
an object in an array, the expression P+1 is a pointer to the next object in 
the array. No further type combinations are allowed for pointers. 

The + operator is associative, and expressions with several additions at 
the same level may be rearranged by the compiler. 
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The result of the — operator is the difference of the operands. The usual 
arithmetic conversions are performed. Additionally, a value of any integral 
type may be subtracted from a pointer, and then the same conversions for 
addition apply. 

If two pointers to objects of the same type are subtracted, the result is 
converted (by division by the length of the object) to an int representing 
the number of objects separating the pointed-to objects. This conversion 
will in general give unexpected results unless the pointers point to objects 
in the same array, since pointers, even to objects of the same type, do not 
necessarily differ by a multiple of the object length. 



Shift Operators 

The shift operators < < and > > group left to right. Both perform the 
usual arithmetic conversions on their operands, each of which must be 
integral. Then the right operand is converted to int; the type of the result 
is that of the left operand. The result is undefined if the right operand is 
negative or greater than or equal to the length of the object in bits. 

shift-expression: 

expression << expression 
expression >> expression 

The value of El< <E2 is El (interpreted as a bit pattern) left-shifted E2 bits. 
Vacated bits are filled. The value of El> >E2 is El right-shifted E2 bit 
positions. The right shift is guaranteed to be logical (0 fill) if El is 
unsigned; otherwise, it may be arithmetic. 



Relational Operators 

The relational operators group left to right. 

relational-expression: 

expression < expression 
expression > expression 
expression <= expression 
expression > = expression 

The operators < (less than), > (greater than), <= (less than or equal to), 
and >= (greater than or equal to) all yield if the specified relation is false 
and 1 if it is true. The type of the result is int. The usual arithmetic 
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conversions are performed. Two pointers may be compared; the result 
depends on the relative locations in the address space of the pointed-to 
objects. Pointer comparison is portable only when the pointers point to 
objects in the same array. 



Equality Operators 

equality-expression: 

expression == expression 
expression expression 

The == (equal to) and the != (not equal to) operators are exactly analogous 
to the relational operators except for their lower precedence. (Thus 
a<b c<d is 1 whenever a<b and c<d have the same truth value.) 

A pointer may be compared to an integer only if the integer is the con- 
stant 0. A pointer to which has been assigned is guaranteed not to point 
to any object and will appear to be equal to 0. In conventional usage, such 
a pointer is considered to be null. 



Bitwise AND Operator 

and-expression: 

expression & expression 

The & operator is associative, and expressions involving & may be rear- 
ranged. The usual arithmetic conversions are performed. The result is the 
bitwise AND function of the operands. The operator applies only to 
integral operands. 

Bitwise Exclusive OR Operator 

exclusive-or-expression: 

expression expression 

The operator is associative, and expressions involving ^ may be rear- 
ranged. The usual arithmetic conversions are performed; the result is the 
bitwise exclusive OR function of the operands. The operator applies only to 
integral operands. 
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Bitwise Inclusive OR Operator 

inclusive-or-expression: 

expression \ expression 

The I operator is associative, and expressions involving | may be rear- 
ranged. The usual arithmetic conversions are performed; the result is the 
bitwise inclusive OR function of its operands. The operator applies only to 
integral operands. 



Logical AND Operator 

logical-and-expression: 

expression && expression 

The operator groups left to right. It returns 1 if both its operands 
evaluate to nonzero, otherwise. Unlike &, && guarantees left to right 
evaluation; moreover, the second operand is not evaluated if the first 
operand evaluates to 0. 

The operands need not have the same type, but each must have one of 
the fundamental types or be a pointer. The result is always int. 



Logical OR Operator 

logical-or-expression: 

expression 1 1 expression 

The 1 1 operator groups left to right. It returns 1 if either of its operands 
evaluates to nonzero, otherwise. Unlike | , 1 1 guarantees left to right 
evaluation; moreover, the second operand is not evaluated if the value of 
the first operand evaluates to nonzero. 

The operands need not have the same type, but each must have one of 
the fundamental types or be a pointer. The result is always int. 



17-20 PROGRAMMER'S GUIDE 



Expressions and Operators 



Conditional Operator 

conditional-expression: 

expression ? expression : expression 

Conditional expressions group right to left. The first expression is 
evaluated; and if it is nonzero, the result is the value of the second expres- 
sion, otherwise that of third expression. If possible, the usual arithmetic 
conversions are performed to bring the second and third expressions to a 
common type. If both are structures or unions of the same type, the result 
has the type of the structure or union. If both pointers are of the same 
type, the result has the common type. Otherwise, one must be a pointer 
and the other the constant 0, and the result has the type of the pointer. 
Only one of the second and third expressions is evaluated. 



Assignment Operators 

There are a number of assignment operators, all of which group right to 
left. All require an lvalue as their left operand, and the type of an assign- 
ment expression is that of its left operand. The value is the value stored in 
the left operand after the assignment has taken place. The two parts of a 
compound assignment operator are separate tokens, 

assignment-expression: 
lvalue ™ expression 
lvalue += expression 
lvalue — = expression 
lvalue *= expression 
lvalue /= expression 
lvalue %= expression 
lvalue >>= expression 
lvalue < <= expression 
lvalue &= expression 
lvalue expression 
lvalue 1= expression 
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In the simple assignment with the value of the expression replaces 
that of the object referred to by the lvalue. If both operands have arith- 
metic type, the right operand is converted to the type of the left preparatory 
to the assignment. Second, both operands may be structures or unions of 
the same type. Finally, if the left operand is a pointer, the right operand 
must in general be a pointer of the same type. However, the constant 
may be assigned to a pointer; it is guaranteed that this value will produce a 
null pointer distinguishable from a pointer to any object. 

The behavior of an expression of the form El op = E2 may be inferred 
by taking it as equivalent to El « El op (E2); however. El is evaluated only 
once. In += and — =, the left operand may be a pointer, in which case the 
(integral) right operand is converted as explained in "Additive Operators." 
All right operands and all nonpointer left operands must have arithmetic 
type. 



Comma Operator 

comma-expression: 

expression , expression 

A pair of expressions separated by a comma is evaluated left to right, and 
the value of the left expression is discarded. The type and value of the 
result are the type and value of the right operand. This operator groups left 
to right. In contexts where comma is given a special meaning, e.g., in lists 
of actual arguments to functions (see "Primary Expressions") and lists of ini- 
tializers (see "Initialization" under "Declarations"), the comma operator as 
described in this subpart can only appear in parentheses. For example, 

f(a, (t=3, t+2), c) 
has three arguments, the second of which has the value 5. 
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Declarations are used to specify the interpretation that C gives to each 
identifier; they do not necessarily reserve storage associated with the 
identifier. Declarations have the form 

declaration: 

declspecifiers declarator-list ^^1- ; 

The declarators in the declarator-list contain the identifiers being declared. 
The decl-specifiers consist of a sequence of type and storage class specifiers. 

declspecifiers: 

type-specifier decl-specifiers 
sc-specifier decl-specifiers ^^^^ 

The list must be self-consistent in a way described below. 



Storage Class Specifiers 

The sc-specifiers are: 

sc-specifier: 
auto 
static 
extern 
register 
typedef 

The typedef specifier does not reserve storage and is called a "storage class 
specifier" only for syntactic convenience. See "typedef" for more informa- 
tion. The meanings of the various storage classes were discussed in 
"Names." 

The auto, static, and register declarations also serve as definitions in 
that they cause an appropriate amount of storage to be reserved. In the 
extern case, there must be an external definition (see "External Definitions") 
for the given identifiers somewhere outside the function in which they are 
declared. 
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A register declaration is best thought of as an auto declaration, together 
with a hint to the compiler that the variables declared will be heavily used. 
Only the first few such declarations in each function are effective. More- 
over, only variables of certain types will be stored in registers. One other 
restriction applies to variables declared using register storage class: the 
address-of operator, &, cannot be applied to them. Smaller, faster programs 
can be expected if register declarations are used appropriately. 

At most, one sc-specifier may be given in a declaration. If the sc- 
specifier is missing from a declaration, it is taken to be auto inside a func- 
tion, extern outside. Exception: functions are never automatic. 



Type Specifiers 

The type-specifiers are 

type-specifier: 

struct-or-union-specifier 

typedef-name 

enum-specifier 
basic-type-specifier: 

basic-type 

basic-type basic-type-specifiers 
basic-type: 
char 
short 
int 
long 

unsigned 
float 
double 
void 

At most one of the words long or short may be specified in conjunction 
with int; the meaning is the same as if int were not mentioned. The word 
long may be specified in conjunction with float; the meaning is the same as 
double. The word unsigned may be specified alone, or in conjunction with 
int or any of its short or long varieties, or with char. 
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Otherwise, at most one type-specifier may be given in a declaration. In 
particular, adjectival use of long, short, or unsigned is not permitted with 
typedef names. If the type-specifier is missing from a declaration, it is 
taken to be int. 

Specifiers for structures, unions, and enumerations are discussed in 
"Structure, Union, and Enumeration Declarations." Declarations with 
typedef names are discussed in "typedef." 



Declarators 

The declarator-list appearing in a declaration is a comma-separated 
sequence of declarators, each of which may have an initializer: 

declarator-list: 

init-declarator 

init-declarator , declarator-list 

init-declarator: 

declarator initializer^ . 

opt 

Initializers are discussed in "Initialization." The specifiers in the declaration 
indicate the type and storage class of the objects to which the declarators 
refer. Declarators have the syntax: 

declarator: 

identifier 
( declarator ) 
♦ declarator 
declarator ( ) 

declarator [ constant-expression^^^ ] 
The grouping is the same as in expressions. 



Meaning of Declarators 

Each declarator is taken to be an assertion that when a construction of 
the same form as the declarator appears in an expression, it yields an object 
of the indicated type and storage class. 
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Each declarator contains exactly one identifier; it is this identifier that is 
declared. If an unadorned identifier appears as a declarator, then it has the 
type indicated by the specifier heading the declaration. 

A declarator in parentheses is identical to the unadorned declarator, but 
the binding of complex declarators may be altered by parentheses. See the 
examples below. 

Now imagine a declaration 

TDl 

where T is a type-specifier (like int, etc.) and Dl is a declarator. Suppose 
this declaration makes the identifier have type "... T where the " ..." is 
empty if Dl is just a plain identifier (so that the type of x in "int x" is just 
int). Then if Dl has the form 

*D 

the type of the contained identifier is " . . . pointer to T 
If Dl has the form 
DO 

then the contained identifier has the type "... function returning T." 
If Dl has the form 

D[constant'expres$ion] 

or 

then the contained identifier has type "... array of T." In the first case, the 
constant expression is an expression whose value is determinable at compile 
time, whose type is int, and whose value is positive. (Constant expressions 
are defined precisely in "Constant Expressions.") When several "array of" 
specifications are adjacent, a multi-dimensional array is created; the constant 
expressions that specify the bounds of the arrays may be missing only for 
the first member of the sequence. This elision is useful when the array is 
external and the actual definition, which allocates storage, is given else- 
where. The first constant expression may also be omitted when the declara- 
tor is followed by initialization. In this case the size is calculated from the 
number of initial elements supplied. 
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An array may be constructed from one of the basic types, from a 
pointer, from a structure or union, or from another array (to generate a 
multi-dimensional array). 

Not all the possibilities allowed by the syntax above are actually permit- 
ted. The restrictions are as follows: functions may not return arrays or 
functions although they may return pointers; there are no arrays of func- 
tions although there may be arrays of pointers to functions. Likewise, a 
structure or union may not contain a function; but it may contain a pointer 
to a function. 

As an example, the declaration 

int np, f(), ^fipO, (-pfiX); 

declares an integer i, a pointer ip to an integer, a function f returning an 
integer, a function fip returning a pointer to an integer, and a pointer pfi to 
a function, which returns an integer. It is especially useful to compare the 
last two. The binding of ♦fipO is ♦(fip(». The declaration suggests, and the 
same construction in an expression requires, the calling of a function fip, 
and then using indirection through the (pointer) result to yield an integer. 
In the declarator (♦pfiX), the extra parentheses are necessary, as they are 
also in an expression, to indicate that indirection through a pointer to a 
function yields a function, which is then called; it returns an integer. 

As another example, 

float faI17], *a£p[17]; 

declares an array of float numbers and an array of pointers to float 
numbers. Finally, 

static int x3d[3][5ll7]; 

declares a static 3-dimensional array of integers, with rank 3x5x7. In com- 
plete detail, x3d is an array of three items; each item is an array of five 
arrays; each of the latter arrays is an array of seven integers. Any of the 
expressions x3d, x3d[i], x3d[i][jl, x3d[i][j][k] may reasonably appear in an 
expression. The first three have type "array" and the last has type int. 
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Structure and Union Declarations 

A structure is an object consisting of a sequence of named members. 
Each member may have any type. A union is an object that may, at a given 
time, contain any one of several members. Structure and union specifiers 
have the same form. 

struct-or-union-specifier: 

struct-or-union [ struct-ded-list } 
struct-or-union identifier [ struct-decNist } 
struct-or-union identifier 

struct-or-union: 
struct 
union 

The struct-decl-list is a sequence of declarations for the members of the 
structure or union: 

struct-decl-list: 

struct-declaration 
struct-declaration struct-decl-list 



struct-declaration: 

type-specifier struct-declarator-list ; 

struct-declarator-list: 
struct-declarator 

struct-declarator , struct-declarator-list 

In the usual case, a struct-declarator is just a declarator for a member of a 
structure or union. A structure member may also consist of a specified 
number of bits. Such a member is also called a field; its length, a non- 
negative constant expression, is set off from the field name by a colon. 

struct-declarator: 
declarator 

declarator : constant-expression 
: constant-expression 
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Within a structure, the objects declared have addresses that increase as 
the declarations are read left to right. Each non-field member of a structure 
begins on an addressing boundary appropriate to its type; therefore, there 
may be unnamed holes in a structure. Field members are packed into 
machine integers; they do not straddle words, A field that does not fit into 
the space remaining in a word is put into the next word. No field may be 
wider than a word. (See Figure 17-2 for sizes of basic types on 3B Comput- 
ers.) 

A struct-declarator with no declarator, only a colon and a width, indi- 
cates an unnamed field useful for padding to conform to externally-imposed 
layouts. As a special case, a field with a width of specifies alignment of 
the next field at an implementation dependent boundary. 

The language does not restrict the types of things that are declared as 
fields. Moreover, even int fields may be considered to be unsigned. For 
these reasons, it is strongly recommended that fields be declared as 
unsigned where that is the intent. There are no arrays of fields, and the 
address-of operator, may not be applied to them, so that there are no 
pointers to fields. 

A union may be thought of as a structure all of whose members begin at 
offset and whose size is sufficient to contain any of its members. At most, 
one of the members can be stored in a union at any time. 

A structure or union specifier of the second form, that is, one of 

struct identifier { struct-decl-list ] 
union identifier { struct-decl-list ] 

declares the identifier to be the structure tag (or union tag) of the structure 
specified by the list, A subsequent declaration may then use the third form 
of specifier, one of 

struct identifier 
union identifier 

Structure tags allow definition of self-referential structures. Structure 
tags also permit the long part of the declaration to be given once and used 
several times. It is illegal to declare a structure or union that contains an 
instance of itself, but a structure or union may contain a pointer to an 
instance of itself. 
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The third form of a structure or union specifier may be used prior to a 
declaration that gives the complete specification of the structure or union in 
situations in which the size of the structure or union is unnecessary. The 
size is unnecessary in two situations: when a pointer to a structure or union 
is being declared and when a typedef name is declared to be a synonym for 
a structure or union. This, for example, allows the declaration of a pair of 
structures that contain pointers to each other. 

The names of members and tags do not conflict with each other or with 
ordinary variables. A particular name may not be used twice in the same 
structure, but the same name may be used in several different structures in 
the same scope. 

A simple but important example of a structure declaration is the follow- 
ing binary tree structure: 



struct tnode 
{ 

char twDrd[203; 
int cxxDit; 
struct tnode 'left; 
struct tnode 'ri^it; 

}; 

V 

which contains an array of 20 characters, an integer, and two pointers to 
similar structures. Once this declaration has been given, the declaration 

struct tnode S;. *sp; 

declares s to be a structure of the given sort and sp to be a pointer to a 
structure of the given sort. With these declarations, the expression 

sp-> count 

refers to the count field of the structure to which sp points; 
s.left 

refers to the left subtree pointer of the structure s; and 
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s.right-> tword[0] 
refers to the first character of the tword member of the right subtree of s. 

Enumeration Declarations 

Enumeration variables and constants have integral type. 

enum-specifier: 

enum { enum-list } 

enum identifier { enum-list ] 

enum identifier 

enum-list: 

enumerator 

enum-list , enumerator 

enumerator: 

identifier 

identifier = constant-expression 

The identifiers in an enum-list are declared as constants and may appear 
wherever constants are required. If no enumerators with = appear, then the 
values of the corresponding constants begin at and increase by 1 as the 
declaration is read from left to right. An enumerator with = gives the asso- 
ciated identifier the value indicated; subsequent identifiers continue the 
progression from the assigned value. 

The names of enumerators in the same scope must all be distinct from 
each other and from those of ordinary variables. 

The role of the identifier in the enum-specifier is entirely analogous to 
that of the structure tag in a struct-specifier; it names a particular enumera- 
tion. For example. 
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eraim ooIcdt { chartreuse, burgund/, claret=20, winedark }; 

ernim oolor *cpf ool; 

col = claret; 
cp = &00I; 

if (♦cp == hurgundy) ... 



makes color the enumeration-tag of a type describing various colors, and 
then declares cp as a pointer to an object of that type and col as an object of 
that type. The possible values are drawn from the set {0,1/20,21}. 



Initialization 

A declarator may specify an initial value for the identifier being 
declared. The initializer is preceded by = and consists of an expression or a 
list of values nested in braces. 

initializer: 



= expression 

= { initializer-list } 

= [ initializer-list , ] 

initializer-list: 
expression 

initializer-list , initializer-list 
{ initializer-list ] 
[ initializer-list , ] 

All the expressions in an initializer for a static or external variable must be 
constant expressions, which are described in "Constant Expressions," or 
expressions that reduce to the address of a previously declared variable, pos- 
sibly offset by a constant expression. Automatic or register variables may be 
initialized by arbitrary expressions involving constants and previously 
declared variables and functions. 



17-32 PROGRAMMER'S GUIDE 



Declarations 

Static and external variables that are not initialized are guaranteed to 
start off as zero. Automatic and register variables that are not initialized are 
guaranteed to start off as garbage. 

When an initializer applies to a scalar (a pointer or an object of arith- 
metic type), it consists of a single expression, perhaps in braces. The initial 
value of the object is taken from the expression; the same conversions as for 
assignment are performed. 

When the declared variable is an aggregate (a structure or array), the 
initializer consists of a brace-enclosed, comma-separated list of initializers 
for the members of the aggregate written in increasing subscript or member 
order. If the aggregate contains subaggregates, this rule applies recursively 
to the members of the aggregate. If there are fewer initializers in the list 
than there are members of the aggregate, then the aggregate is padded with 
zeros. It is not permitted to initialize unions or automatic aggregates. 

Braces may in some cases be omitted. If the initializer begins with a left 
brace, then the succeeding comma-separated list of initializers initializes the 
members of the aggregate; it is erroneous for there to be more initializers 
than members. If, however, the initializer does not begin with a left brace, 
then only enough elements from the list are taken to account for the 
members of the aggregate; any remaining members are left to initialize the 
next member of the aggregate of which the current aggregate is a part. 

A final abbreviation allows a char array to be initialized by a string 
literal. In this case successive characters of the string literal initialize the 
members of the array. 

For example, 

int x[] = { 1, 3, 5 }; 

declares and initializes x as a one-dimensional array that has three members, 
since no size was specified and there are three initializers. 

float yI4][3] = 

( 

{ 1, 3, 5 ), 
( 2, 4, 6 }, 
{ 3, 5, 7 }, 

); 

is a completely-bracketed initialization: 1, 3, and 5 initialize the first row of 
the array y[Ol, namely y[01[0], y[01[ll, and y[0][2]. Likewise, the next two 
lines initialize y[l] and y[2]. The initializer ends early and therefore y[3l is 
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initialized with 0. Precisely, the same efifect could have been achieved by 

float y[4ll3l = 
{ 

1, 3, 5, 2, 4, 6, 3, 5, 7 

}; 

The initializer for y begins with a left brace but that for y[0] does not; 
therefore, three elements from the list are used. Likewise, the next three 
are taken successively for y[l] and y[2]. Also, 

float yl4][3] = 

{ 

(ll{2},{3l{4) 

); 

initializes the first column of y (regarded as a two-dimensional array) and 
leaves the rest 0. 

Finally, 

char msg[] = "Syntax error on line %s\n"; 

shows a character array whose members are initialized with a string literal. 
The length of the string (or size of the array) includes the terminating NUL 
character, \0. 



Type Names 

In two contexts (to specify type conversions explicitly by means of a cast 
and as an argument of sizeof), it is desired to supply the name of a data 
type. This is accomplished using a "type name," which in essence is a 
declaration for an object of that type that omits the name of the object. 

type-name: 

type-specifier abstract-declarator 

abstract-declarator: 
empty 

( abstract-declarator ) 
* abstract-declarator 
abstract-declarator ( ) 

abstract-declarator [ constant-expression . ] 
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To avoid ambiguity, in the construction 
( abstract-declarator ) 



the abstract-declarator is required to be nonempty. Under this restriction, it 
is possible to identify uniquely the location in the abstract-declarator where 
the identifier would appear if the construction were a declarator in a 
declaration. The named type is then the same as the type of the hypotheti- 
cal identifier. For example. 



int * 
int *[3] 
int (#)[3] 
int ♦( ) 
int {*)( ) 
int (*[3])( ) 



name respectively the types "integer," "pointer to integer," "array of three 
pointers to integers," "pointer to an array of three integers," "function 
returning pointer to integer," "pointer to function returning an integer," and 
"array of three pointers to functions returning an integer." 

Implicit Declarations 

It is not always necessary to specify both the storage class and the type 
of identifiers in a declaration. The storage class is supplied by the context 
in external definitions and in declarations of formal parameters and struc- 
ture members. In a declaration inside a function, if a storage class but no 
type is given, the identifier is assumed to be int; if a type but no storage 
class is indicated, the identifier is assumed to be auto. An exception to the 
latter rule is made for functions because auto functions do not exist. If the 
type of an identifier is "function returning . . . ," it is implicitly declared to be 
extern. 
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In an expression, an identifier followed by ( and not already declared is 
contextually declared to be "function returning int." 

typedef 

Declarations whose "storage class" is typedef do not define storage but 
instead define identifiers that can be used later as if they were type key- 
words naming fundamental or derived types. 

typedef-name: 
identifier 

Within the scope of a declaration involving typedef, each identifier appear- 
ing as part of any declarator therein becomes syntactically equivalent to the 
type keyword naming the type associated with the identifier in the way 
described in "Meaning of Declarators." For example, after 

typedef int MILES, ♦KLICKSP; 

typedef struct { double re, im; } complex; 

the constructions 

MILES distance; 

extern KLICKSP metricp; 

complex z, *zp; 

are all legal declarations; the type of distance is int, that of metricp is 
"pointer to int," and that of z is the specified structure. The zp is a pointer 
to such a structure. 

The typedef does not introduce brand-new types, only synonyms for 
types that could be specified in another way. Thus in the example above 
distance is considered to have exactly the same type as any other int object. 
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Except as indicated, statements are executed in sequence. 



Expression Statement 

Most statements are expression statements, which have the form 

expression ; 

Usually expression statements are assignments or function calls. 

Compound Statement or Block 

So that several statements can be used where one is expected, the com- 
pound statement (also, and equivalently, called "block") is provided: 

compound-statement: 

{ declaration-list . statement-list , } 
opt opt 

declaration-list: 
declaration 

declaration declaration-list 

statement-list: 
statement 

statement statement-list 

If any of the identifiers in the declaration-list were previously declared, the 
outer declaration is pushed down for the duration of the block, after which 
it resumes its force. 

Any initializations of auto or register variables are performed each time 
the block is entered at the top. It is currently possible (but a bad practice) 
to transfer into a block; in that case the initializations are not performed. 
Initializations of static variables are performed only once when the program 
begins execution. Inside a block, extern declarations do not reserve storage 
so initialization is not permitted. 
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Conditional Statement 

The two forms of the conditional statement are 

if ( expression ) statement 

if ( expression ) statement else statement 

In both cases, the expression is evaluated; if it is nonzero, the first substate- 
ment is executed. In the second case, the second substatement is executed if 
the expression is 0. The else ambiguity is resolved by connecting an else 
with the last encountered else-less if. 



while Statement 

The while statement has the form 

while ( expression ) statement 

The substatement is executed repeatedly so long as the value of the expres- 
sion remains nonzero. The test takes place before each execution of the 
statement. 



do Statement 

The do statement has the form 

do statement while ( expression ) ; 

The substatement is executed repeatedly until the value of the expression 
becomes 0. The test takes place after each execution of the statement. 

for Statement 

The for statement has the form: 

for ( exp-1^^^ ; exp-2^^^ ; exp-S^^^ ) statement 
Except for the behavior of continue, this statement is equivalent to 
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exp-1 ; 

while ( exp-2 ) 

{ 

statement 
exp-3 ; 

) 

Thus the first expression specifies initialization for the loop; the second 
specifies a test, made before each iteration, such that the loop is exited when 
the expression becomes 0. The third expression often specifies an incre- 
menting that is performed after each iteration. 

Any or all of the expressions may be dropped. A missing exp-I makes 
the implied while clause equivalent to while(l); other missing expressions 
are simply dropped from the expansion above. 



switch Statement 

The switch statement causes control to be transferred to one of several 
statements depending on the value of an expression. It has the form 

switch ( expression ) statement 

The usual arithmetic conversion is performed on the expression, but the 
result must be int. The statement is typically compound. Any statement 
within the statement may be labeled with one or more case prefixes as fol- 
lows: 

case const ant -expression : 

where the constant expression must be int. No two of the case constants in 
the same switch may have the same value. Constant expressions are pre- 
cisely defined in "Constant Expressions." 

There may also be at most one statement prefix of the form 

default : 

which properly goes at the end of the case constants. 
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When the switch statement is executed, its expression is evaluated and 
compared in turn with each case constant. If one of the case constants is 
equal to the value of the expression, control is passed to the statement fol- 
lowing the matched case prefix. If no case constant matches the expression 
and if there is a default prefix, control passes to the statement prefixed by 
default. If no case matches and if there is no default, then none of the 
statements in the switch is executed. 

The prefixes case and default do not alter the flow of control, which 
continues unimpeded across such prefixes. That is, once a case constant is 
matched, all case statements (and the default) from there to the end of the 
switch are executed. To exit from a switch, see "break Statement." 

Usually, the statement that is the subject of a switch is compound. 
Declarations may appear at the head of this statement, but initializations of 
automatic or register variables are ineffective. A simple example of a com- 
plete switch statement is: 




case 'o': 

oflag = IBUE; 
break; 

case 'p': 

pflag = TRUE; 
break; 

case 'r': 

rflag = TRUE; 
break; 

default : 

(void) fponjitf (stderr, "Uhlmown qptiaaNn"}; 
exit(2); 
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break Statement 

The statement break ; causes termination of the smallest enclosing 
while, do, for, or switch statement; control passes to the statement follow- 
ing the terminated statement. 



continue Statement 

The statement continue ; causes control to pass to the loop-continuation 
portion of the smallest enclosing while, do, or for statement; that is to the 
end of the loop. More precisely, in each of the statements 




a continue is equivalent to goto contin. (Following the contin: is a null 
statement; see "Null Statement.") 



return Statement 

A function returns to its caller by means of the return statement, which 
has one of the forms 

return ; 

return expression ; 

In the first case, the returned value is undefined. In the second case, the 
value of the expression is returned to the caller of the function. If required, 
the expression is converted, as if by assignment, to the type of function in 
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which it appears. Flowing off the end of a function is equivalent to a 
return with no returned value. 

goto Statement 

Control may be transferred unconditionally by means of the statement 
goto identifier ; 

The identifier must be a label (see "Labeled Statement") located in the 
current function. 

Labeled Statement 

Any statement may be preceded by label prefixes of the form 
identifier : 

which serve to declare the identifier as a label. The only use of a label is as 
a target of a goto. The scope of a label is the current function, excluding 
any subblocks in which the same identifier has been redeclared. See "Scope 
Rules." 

Null Statement 

The null statement has the form 

A null statement is useful to carry a label just before the } of a compound 
statement or to supply a null body to a looping statement such as while. 
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A C program consists of a sequence of external definitions. An external 
definition declares an identifier to have storage class extern (by default) or 
perhaps static, and a specified type. The type-specifier (see "Type Specifiers" 
in "Declarations") may also be empty, in which case the type is taken to be 
int. The scope of external definitions persists to the end of the file in 
which they are declared just as the effect of declarations persists to the end 
of a block. The syntax of external definitions is the same as that of all 
declarations except that only at this level may the code for functions be 
given. 



External Function Definitions 

Function definitions have the form 

function-definition: 

decUspecifiers^P^ function-declarator function-body 

The only sc-specifiers allowed among the decl-specifiers are extern or static; 
see "Scope of Externals" in "Scope Rules" for the distinction between them. 
A function declarator is similar to a declarator for a "function returning , , . " 
except that it lists the formal parameters of the function being defined. 

function-declarator: 

declarator ( parameter-list ) 

parameter-list: 
identifier 

identifier , parameter-list 

The function-body has the form 

function-body: 

declaration-list^^^ compound-statement 

The identifiers in the parameter list, and only those identifiers, may be 
declared in the declaration list. Any identifiers whose type is not given are 
taken to be int. The only storage class that may be specified is register; if it 
is specified, the corresponding actual parameter will be copied, if possible, 
into a register at the outset of the function. 
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A simple example of a complete function definition is 





int inax(a, b, c) 

int a, b, c; 



int m; 



m = (a > b) ? a : b; 
retU2:n((m > c) ? m : c); 





Here int is the type-specifier; max(a, b, c) is the function-declarator; 

int a, c; is the declaration-list for the formal parameters; { } is the block 

giving the code for the statement. 

The C program converts all float actual parameters to double, so formal 
parameters declared float have their declaration adjusted to read double. 
All char and short formal parameter declarations are similarly adjusted to 
read int. Also, since a reference to an array in any context (in particular as 
an actual parameter) is taken to mean a pointer to the first element of the 
array, declarations of formal parameters declared "array of ... " are adjusted 
to read "pointer to " 



External Data Definitions 

An external data definition has the form 

data-definition: 
declaration 

The storage class of such data may be extern (which is the default) or static, 
but not auto or register. 
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A C program need not all be compiled at the same time. The source 
text of the program may be kept in several files, and precompiled routines 
may be loaded from libraries. Communication among the functions of a 
program may be carried out both through explicit calls and through mani- 
pulation of external data. 

Therefore, there are two kinds of scopes to consider: first, what may be 
called the lexical scope of an identifier, which is essentially the region of a 
program during which it may be used without drawing "undefined 
identifier" diagnostics; and second, the scope associated with external 
identifiers, which is characterized by the rule that references to the same 
external identifier are references to the same object. 



Lexical Scope 

The lexical scope of identifiers declared in external definitions persists 
from the definition through the end of the source file in which they appear. 
The lexical scope of identifiers that are formal parameters persists through 
the function with which they are associated. The lexical scope of identifiers 
declared at the head of a block persists until the end of the block. The lexi- 
cal scope of labels is the whole of the function in which they appear. 

In all cases, however, if an identifier is explicitly declared at the head of 
a block, including the block constituting a function, any declaration of that 
identifier outside the block is suspended until the end of the block. 

Remember also (see "Structure, Union, and Enumeration Declarations" in 
"Declarations") that tags, identifiers associated with ordinary variables, and 
identities associated with structure and union members form three disjoint 
classes which do not conflict. Members and tags follow the same scope 
rules as other identifiers. The enum constants are in the same class as ordi- 
nary variables and follow the same scope rules. The typedef names are in 
the same class as ordinary identifiers. They may be redeclared in inner 
blocks, but an explicit type must be given in the inner declaration: 
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{ 

int distance; 




The int must be present in the second declaration, or it would be taken to 
be a declaration with no declarators and type distance. 

Scope of Externals 

If a function refers to an identifier declared to be extern, then some- 
where among the files or libraries constituting the complete program there 
must be at least one external definition for the identifier. All functions in a 
given program that refer to the same external identifier refer to the same 
object, so care must be taken that the type and size specified in the 
definition are compatible with those specified by each function that refer- 
ences the data. 

It is illegal to explicitly initialize any external identifier more than once 
in the set of files and libraries comprising a multi-file program. It is legal to 
have more than one data definition for any external non-function identifier; 
explicit use of extern does not change the meaning of an external declara- 
tion. 

In restricted environments, the use of the extern storage class takes on 
an additional meaning. In these environments, the explicit appearance of 
the extern keyword in external data declarations of identities without ini- 
tialization indicates that the storage for the identifiers is allocated else- 
where, either in this file or another file. It is required that there be exactly 
one definition of each external identifier (without extern) in the set of files 
and libraries that make up a multi-file program. 

Identifiers declared static at the top level in external definitions are not 
visible in other files. Functions may be declared static. 
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The C compilation system contains a preprocessor capable of macro sub- 
stitution, conditional compilation, and inclusion of named files. Lines 
beginning with # communicate with this preprocessor. There may be any 
number of blanks and horizontal tabs between the # and the directive, but 
no additional material (such as comments) is permitted. These lines have 
syntax independent of the rest of the language; they may appear anywhere 
and have effect that lasts (independent of scope) until the end of the source 
program file. 



Tokem Replacement 

A control line of the form 

#de£ine identifier token-string 

causes the preprocessor to replace subsequent instances of the identifier 
with the given string of tokens. Semicolons in or at the end of the token- 
string are part of that string. A line of the form 

#define identifierddentifier, ... ) token-string ^p^^ 

where there is no space between the first identifier and the (, is a macro 
definition with arguments. There may be zero or more formal parameters. 
Subsequent instances of the first identifier followed by a (, a sequence of 
tokens delimited by commas, and a ) are replaced by the token string in the 
definition. Each occurrence of an identifier mentioned in the formal param- 
eter list of the definition is replaced by the corresponding token string from 
the call. The actual arguments in the call are token strings separated by 
commas; however, commas in quoted strings or protected by parentheses do 
not separate arguments. The number of formal and actual parameters must 
be the same. Strings and character constants in the token-string are scanned 
for formal parameters, but strings and character constants in the rest of the 
program are not scanned for defined identifiers to replace. 

In both forms the replacement string is rescanned for more defined 
identifiers. In both forms a long definition may be continued on another 
line by writing \ at the end of the line to be continued. This facility is most 
valuable for definition of "manifest constants," as in 
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Mefine TABSIZE 100 

int table[TABSIZE] ; 

A control line of the form 
#undef identifier 

causes the identifier's preprocessor definition (if any) to be forgotten. 

If a #defined identifier is the subject of a subsequent #define with no 
intervening #undef/ then the two token-strings are compared textually. If 
the two token-strings are not identical (all white space is considered as 
equivalent), then the identifier is considered to be redefined. 



File Inclusion 

A control line of the form 

#include "filename " 

causes the replacement of that line by the entire contents of the file filename. 
The named file is searched for first in the directory of the file containing 
the #include, and then in a sequence of specified or standard places. Alter- 
natively, a control line of the form 

#include < filename > 

searches only the specified or standard places and not the directory of the 
#include. (How the places are specified is not part of the language. See 
cpp(l) for a description of how to specify additional libraries.) 

#includes may be nested. 



Conditional Compilation 

A compiler control line of the form 

#if restricted-constant-expression 

checks whether the restricted-constant expression evaluates to nonzero. 
(Constant expressions are discussed in "Constant Expressions"; the following 
additional restrictions apply here: the constant expression may not contain 
sizeof, casts, or an enumeration constant.) 
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A restricted-constant expression may also contain the additional unary 
expression 

defined identifier 

or 

defined (identifier) 

which evaluates to one if the identifier is currently defined in the prepro- 
cessor and zero if it is not. 

All currently defined identifiers in restricted-constant-expressions are 
replaced by their token-strings (except those identifiers modified by 
defined) just as in normal text. The restricted-constant expression will be 
evaluated only after all expressions have finished. During this evaluation, 
all undefined (to the procedure) identifiers evaluate to zero. 

A control line of the form 

#ifdef identifier 

checks whether the identifier is currently defined in the preprocessor; i.e., 
whether it has been the subject of a #define control line. It is equivalent to 
#if defined (identifier). 

A control line of the form 

#ifndef identifier 

checks whether the identifier is currently undefined in the preprocessor. It 
is equivalent to #if !de£ined (identifier). 

All three forms are followed by an arbitrary number of lines, possibly 
containing a control line 

#else 

and then by a control line 
#endif 

If the checked condition is true, then any lines between #else and #endif 
are ignored. If the checked condition is false, then any lines between the 
test and a #else or, lacking a #else, the #endif are ignored. 



C LANGUAGE 17-49 



Compiler Control Lines 



Another control directive is 

#elif restricted-constant-expression 

An arbitrary number of #elif directives can be included between #if, 
#ifdef, or #ifndef and #else, or #endif directives. These constructions 
may be nested. 



Line Control 

For the benefit of other preprocessors that generate C programs, a line 
of the form 

#line constant "filename'' 

causes the compiler to believe, for purposes of error diagnostics, that the 
line number of the next source line is given by the constant and the current 
input file is named by "filename". If "filename" is absent, the remembered file 
name does not change. 



Version Control 

This capability, known as S-lists, helps administer version control infor- 
mation. A line of the form 

#ident "version" 

puts any arbitrary string in the .comment section of the a.out file. It is usu- 
ally used for version control. It is worth remembering that .comment sec- 
tions are not loaded into memory when the a.out file is executed. 



17-50 PROGRAMMER'S GUIDE 



Types Revisited 



This part summarizes the operations that can be performed on objects of 
certain types. 

Structures and Unions 

Structures and unions may be assigned, passed as arguments to func- 
tions, and returned by functions. Other plausible operators, such as equal- 
ity comparison and structure casts, are not implemented. 

In a reference to a structure or union member, the name on the right of 
the — > or the . must specify a member of the aggregate named or pointed 
to by the expression on the left. In general, a member of a union may not 
be inspected unless the value of the union has been assigned using that 
same member. However, one special guarantee is made by the language in 
order to simplify the use of unions: if a union contains several structures 
that share a common initial sequence and if the union currently contains 
one of these structures, it is permitted to inspect the common initial part of 
any of the contained structures. For example, the following is a legal frag- 
ment: 
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{ 



struct 



type; 



} n; 



struct 



int 
int 



type; 
intnode; 



} ni; 



struct 



int 
float 



type; 

floatnode; 



) nf ; 

} u; 

u.nf .type = FLOAT; 
u.nf .floatnode = 3.14; 

if (u.n.type == FLOAT) 



Functions 

There are only two things that can be done with a function: call it or 
take its address. If the name of a function appears in an expression not in 
the function-name position of a call, a pointer to the function is generated. 
Thus, to pass one function to another, one might say 



. . . sln(u.nf .floatnode) . . . 





int f ( ) ; 



g(f); 



Then the definition of g might read 
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Notice that f must be declared explicitly in the calling routine since its 
appearance in g(f) was not followed by (. 

Arrays, Pointers, and Subscripting 

Every time an identifier of array type appears in an expression, it is con- 
verted into a pointer to the first member of the array. Because of this 
conversion, arrays are not lvalues. By definition, the subscript operator [] is 
interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because 
of the conversion rules that apply to +, if El is an array and E2 an integer, 
then E1[E2] refers to the E2 -th member of El. Therefore, despite its asym- 
metric appearance, subscripting is a commutative operation. 

A consistent rule is followed in the case of multidimensional arrays. If 
E is an n-dimensional array of rank ixjx...xk, then E appearing in an 
expression is converted to a pointer to an (n— l)-dimensional array with 
rank jx...xk. If the ♦ operator, either explicitly or implicitly as a result of 
subscripting, is applied to this pointer, the result is the pointed-to (n-1)- 
dimensional array, which itself is immediately converted into a pointer. 

For example, consider int x[3l[5]; Here x is a 3x5 array of integers. 
When x appears in an expression, it is converted to a pointer to (the first of 
three) 5-membered arrays of integers. In the expression x[il, which is 
equivalent to *(x+i), x is first converted to a pointer as described; then i is 
converted to the type of x, which involves multiplying i by the length the 
object to which the pointer points, namely 5-integer objects. The results are 
added and indirection applied to yield an array (of five integers) which in 
turn is converted to a pointer to the first of the integers. If there is another 
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subscript the same argument applies again; this time the result is an 
integer. 

Arrays in C are stored row-wise (last subscript varies fastest) and the 
first subscript in the declaration helps determine the amount of storage con- 
sumed by an array. Arrays play no other part in subscript calculations. 

Explicit Pointer Conversions 

Certain conversions involving pointers are permitted but have 
implementation-dependent aspects. They are all specified by means of an 
explicit type-conversion operator, see "Unary Operators" under "Expressions" 
and "Type Names" under "Declarations." 

A pointer may be converted to any of the integral types large enough to 
hold it. Whether an int or long is required is machine dependent. The 
mapping function is also machine dependent but is intended to be 
unsurprising to those who know the addressing structure of the machine. 

An object of integral type may be explicitly converted to a pointer. The 
mapping always carries an integer converted from a pointer back to the 
same pointer but is otherwise machine dependent. 

A pointer to one type may be converted to a pointer to another type. 
The resulting pointer may cause addressing exceptions upon use if the sub- 
ject pointer does not refer to an object suitably aligned in storage. It is 
guaranteed that a pointer to an object of a given size may be converted to a 
pointer to an object of a smaller size and back again without change. 

For example, a storage-allocation routine might accept a size (in bytes) 
of an object to allocate, and return a char pointer; it might be used in this 
way. 

extern char *alloc( ); 
double «dp; 

dp = (double *) alloc (sizeof (double) ) ; 
^Klp = 22.0 / 7.0; 

The alloc must ensure (in a machine-dependent way) that its return value is 
suitable for conversion to a pointer to double; then the use of the function 
is portable. 
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In several places C requires expressions that evaluate to a constant: 
after case, as array bounds, and in initializers. In the first two cases, the 
expression can involve only integer constants, character constants, casts to 
integral types, enumeration constants, and sizeof expressions, possibly con- 
nected by the binary operators 

+ - * / % & r << >> == != < > <=>=&& II 

or by the unary operators 
or by the ternary operator 

Parentheses can be used for grouping but not for function calls. 

More latitude is permitted for initializers; besides constant expressions 
as discussed above, one can also use floating constants and arbitrary casts 
and can also apply the unary & operator to external or static objects and to 
external or static arrays subscripted with a constant expression. The unary 
& can also be applied implicitly by appearance of unsubscripted arrays and 
functions. The basic rule is that initializers must evaluate either to a con- 
stant or to the address of a previously declared external or static object plus 
or minus a constant. 



C LANGUAGE 17-55 



Portability Considerations 

Certain parts of C are inherently machine dependent. The following 
list of potential trouble spots is not meant to be all-inclusive but to point 
out the main ones. 

Purely hardware issues like word size and the properties of floating 
point arithmetic and integer division have proven in practice to be not 
much of a problem. Other facets of the hardware are reflected in differing 
implementations. Some of these, particularly sign extension (converting a 
negative character into a negative integer) and the order in which bytes are 
placed in a word, are nuisances that must be carefully watched. Most of the 
others are only minor problems. 

The number of register variables that can actually be placed in registers 
varies from machine to machine as does the set of valid types. Nonetheless, 
the compilers all do things properly for their own machine; excess or 
invalid register declarations are ignored. 

The order of evaluation of function arguments is not specified by the 
language. The order in which side ejBFects take place is also unspecified. 

Since character constants are really objects of type int, multicharacter 
character constants may be permitted. The specific implementation is very 
machine dependent because the order in which characters are assigned to a 
word varies from one machine to another. 

Fields are assigned to words and characters to integers right to left on 
some machines and left to right on other machines. These differences are 
invisible to isolated programs that do not indulge in type punning (e.g., by 
converting an int pointer to a char pointer and inspecting the pointed-to 
storage) but must be accounted for when conforming to externally-imposed 
storage layouts. 
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This summary of C syntax is intended more for aiding comprehension 
than as an exact statement of the language. 

Expressions 

The basic expressions are: 

expression: 

primary 
* expression 
& lvalue 

— expression 
! expression 

expression 
++ lvalue 

— lvalue 
lvalue ++ 
lvalue — 
sizeof expression 
sizeof {type-name) 

( type-name ) expression 
expression binop expression 
expression ? expression : expression 
lvalue asgnop expression 
expression , expression 

primary: 

identifier 
constant 
string literal 
( expression ) 

primary ( expression-list ) 
primary [ expression ] 
primary . identifier 
primary — > identifier 
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lvalue: 

identifier 

primary [ expression ] 
lvalue . identifier 
primary -> identifier 

* expression 
( lvalue ) 

The primary-expression operators 

[] . -> 

have highest priority and group left to right. The unary operators 

♦ & - ! • 4- H sizeof ( type-name ) 

have priority below the primary operators but higher than any binary 
operator and group right to left. Binary operators group left to right; they 
have priority decreasing as indicated below. 

binop: 

♦ / % 
+ - 
>> << 
<><-> = 

& 

A 

I 

&& 
II 

The conditional operator groups right to left. 

Assignment operators all have the same priority and all group right to 

left. 

asgnop: 

The comma operator has the lowest priority and groups left to right. 
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Declarations 



declaration: 

decl-specifiers init-declarator-list ; 

decl-specifiers: 

type-specifier decl-specifiers 
sc-specifier decl-specifiers 

sc-specifier: 
auto 
static 
extern 
register 
typedef 



type-specifier: 

struct-or-union-specifier 

typedef-name 

enum-specifier 

basic-type-specifier: 
basic-type 

basic-type basic-type-specifiers 



basic-type: 
char 
short 
int 
long 

unsigned 
float 
double 
void 
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enunt'Specifier: 

enum { enum-list ] 

enum identifier [ enum-list } 

enum identifier 

enum-list: 

enumerator 

enum-list , enumerator 

enumerator: 

identifier 

identifier — constant-expression 

init-declarator-list: 
init-declarator 

init-declarator , init-declarator-list 

init-declarator: 

declarator initializer 

declarator: 

identifier 
( declarator ) 
* declarator 
declarator ( ) 

declarator [ constant-expression^^^ ] 

struct-or-union-specifier: 

struct { struct-decl-list ] 

struct identifier { struct-decl-list } 

struct identifier 

union { struct-decl-list } 

union identifier [ struct-decl-list ] 

union identifier 

struct-decl-list: 

struct-declaration 
struct-declaration struct-decl-list 
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struct-declaration: 

type-specifier struct-declarator-list ; 

struct-declarator-list: 
struct-declarator 

struct-declarator , struct-declarator-list 

struct-declarator: 
declarator 

declarator : constant-expression 
: constant-expression 

initializer: 

= expression 

= { initializer-list ] 

= { initializer-list , } 

initializer-list: 
expression 

initializer-list , initializer-list 
{ initializer-list ) 
{ initializer-list , } 

type-name: 

type-specifier abstract-declarator 

abstract-declarator: 
empty 

( abstract-declarator ) 
* abstract-declarator 
abstract-declarator ( ) 

abstract-declarator [ constant-expression^^^ ] 

typedef-name: 
identifier 
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Statements 

compound-statement: 

{ declaration-list . statement-list . } 
opt opt 

declaration-list: 
declaration 

declaration declaration-list 

statement-list: 
statement 

statement statement-list 

statement: 

compound-statement 
expression ; 

if ( expression ) statement 

if ( expression ) statement else statement 

while ( expression ) statement 

do statement while ( expression ) ; 

for ( exp^p^;exp^p^;expQp0 statement 

switch ( expression ) statement 

case constant-expression : statement 

default ; statement 

break ; 

continue ; 

return ; 

return expression ; 
goto identifier ; 
identifier : statement 
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External Definitions 

program: 

external-definition 
external-definition program 

external-definition: 

function-definition 
data-definition 

function-definition: 

decl-specifier^^^ function-declarator function-body 

function-declarator: 

declarator ( parameter-list ) 

parameter-list: 
identifier 

identifier , parameter-list 

function-body: 

declaration-list^^^ compound-statement 

data-definition: 

extern declaration ; 
static declaration ; 
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Preprocessor 



#de£ine identifier token-string 




#define identifier (identifier, . . .j token-stringy 
#uiidef identifier 
#include filename** 
#indude < filename > 
#if restricted-constant-expression 
#ifdef identifier 
#ifndef identifier 
#elif restricted-constant-expression 
#else 
#endif 

#line constant ** filename " 
#ideiit "^versiori^ 
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Introduction 

This appendix contains information useful to the programmer who uses 
floating point extensively. 

The C Programming Language Utilities support a subset of the IEEE 
Standard for Binary Floating Point Arithmetic (ANSI /IEEE Std 754-1985). The C 
compiler uses the IEEE standard single- and double-precision data types, 
operations, and conversions. For more details on the subset of the IEEE 
standard supported by the C Programming Language Utilities see "IEEE 
Requirements" in this appendix. 

Library functions are provided for the programmer who needs the full 
range of IEEE support. Most programmers won't need any special functions 
to use floating point operations in their programs, but those who do require 
extra detail about floating point operations can find it in this appendix. 

This appendix contains sections on the following topics: 

■ IEEE Arithmetic 

■ Floating Point Exception Handling 

■ Conversion Between Binary and Decimal Values 

■ Single-Precision Floating Point Operations 

■ Implicit Precision of Subexpressions 

■ IEEE Requirements 



All floating point operations can potentially have side effects, in the 
NOTE sense that they can set the exception sticky bits or cause a floating 
I point trap. The optimizer removes floating point operations without 
I regard to these side effects if the result of those operations is otherwise 
unused. If a program fragment intentionally produces such side 
effects, the optimizer might alter the behavior of the program frag- 
ment. 
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IEEE Arithmetic 

This section provides information to those programmers who need to 
know the details of floating point representation, the environment of the 3B 
Computer family, and exception handling. Most users need not be con- 
cerned with the details of the floating point environment. Note that some 
programs which used to dump core will proceed with computations with 
diagnostic values or floating point "infinities." 



NOTE 



The floating point subsystem of the computers in the 3B Computer family 
are based on the Standard for Binary Floating-Point Arithmetic, ANSI /IEEE 
Std 754-1985. For more information about this standard, write to IEEE 
Service Center, 445 Hoes Lane, Piscataway, NJ, 08854, or call (201) 
981-0060 



Data Types and Formats 
Single-Precision 

Single-precision floating point numbers have the following format: 



31 



30 



23 22 



SIGN EXPONENT 



FRACTION 



binary point 



Field 


Position 


Full Name 


sign 

exponent 
fraction 


31 
30-23 
22-0 


sign bit (O==positive, l==negative) 

exponent (biased by 127) 

fraction (bits to right of binary point) 
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Double-Precision 

Double-precision floating point numbers have the following format: 



63 



62 



52 51 



SIGN EXPONENT 



FRACTION 



binary point 



Field 


Position 


Full Name 


sign 

exponent 
fraction 


63 
62-52 
51-0 


sign bit (O==positive, l==negative) 

exponent (biased by 1023) 

fraction (bits to right of binary point) 



Normalized Numbers 

A number is normalized if the exponent field contains some value other 
than all I's or all O's. 

The exponent field contains a biased exponent, where the bias is 127 in 
single-precision, and 1023 in double-precision. Thus, the exponent of a nor- 
malized floating point number is in the range -126 to 127 inclusive for 
single-precision, and in the range -1022 to 1023 inclusive for double- 
precision. 

There is an implicit bit associated with both single- and double- 
precision formats. The implicit bit is not explicitly stored anywhere (thus 
its name). Logically, for normalized operands the implicit bit has a value of 
1 and resides immediately to the left of the binary point (in the 2^ position). 
Thus the implicit bit and fraction field together can represent values in the 
range 1 to 2—2"^^ inclusive for single-precision, and in the range 1 to 
2—2"^^ inclusive for double-precision. 

Thus normalized single-precision numbers can be in the range (plus or 
minus) 2'^^^ to (2-2-2^)2^^'' inclusive. 

Normalized double-precision numbers can be in the range (plus or 
minus) 2--^«'22 ^2-2-^^)2^^^^ inclusive. 
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Denormalized Numbers 

A number is denormalized if the exponent field contains all O's and the 
fraction field does not contain all O's. 

Thus denormalized single-precision numbers can be in the range (plus 
or minus) i-^^^i-^^l'^"^"^ to (1-2-^)2-^^^ inclusive. 

Denormalized double-precision numbers can be in the range (plus or 
minus) 2-10222-51^2-^*^73 (i_2-5i)2-i022 inclusive. 

Maximum and Minimum Representable Floating Point Values 

The maximum and minimum representable values in floating point for- 
mat are defined in the header file values.h: 

#define MAXDOUBLE 1.79769313486231470e+308 

#define MAXFLOAT ((float)3.40282346638528860e+38) 

#define MINDOUBLE 4.94065645841246544e-324 

#define MINFLOAT ((float)1.40129846432481707e-45) 



Special-Case Values 

The following table gives the names of special cases and how each is 
represented. 



Value Name 


Sign 


Exponent 


Fraction 








MSB 


Rest of Fraction 


NaN (non-trapping) 


X 


Max 





Nonzero 


Trapping NaN 


X 


Max 


1 


X 


Positive Infinity 





Max 


Min 




Negative Infinity 


1 


Max 


Min 




Positive Zero 





Min 


Min 




Negative Zero 


1 


Min 


Min 




Denormalized Number 


X 


Min 


Nonzero 


Normalized Number 


X 


NotMM 


X 
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Key: 



X 



don't care 



Max 



maximum value that can be stored in the field (all I's) 



Min 



minimum value that can be stored in the field (all O's) 



NotMM 



field is not equal to either Min or Max values 



Nonzero 



field contains at least one "1" bit 



MSB 



Most Significant Bit 



The algorithm for classification of a value into special cases follows: 

// (Exponent'==Max) 
If (Fraction— —Min) 

Then the number is Infinity (Positive or Negative 
as determined by the Sign bit). 
Else the number is NaN (Trapping if FractionMSB^^O, 
non-Trapping if FractionMSB'=='l). 

Else If (Exponent=^Min) 
If (Fraction==Min) 

Then the number is Zero (Positive or Negative 
as determined by the Sign bit). 
Else the number is Denormalized. 
Else the number is Normalized. 

When the MAU generates a NaN, the fraction contains all ones. 

NaNs and Infinities 

The floating point functions provide two kinds of special representation: 

■ Infinity — Positive infinity in a format compares greater than all 
other representable numbers in the same format. Arithmetic opera- 
tions on infinities are quite intuitive. For example, adding any 
representable number to infinity is a valid operation the result of 
which is positive infinity. Subtracting positive infinity from itself is 
invalid. If some arithmetic operation overflows, and the overflow trap 
is disabled, in some rounding modes the delivered result is infinity. 
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■ Not-a-Number — These floating point representations are not 
numbers. They can be used to carry diagnostic information. There 
are two kinds of NaNs: signaling NaNs and quiet NaNs. Signaling 
NaNs raise the invalid operation exception whenever they are used 
as operands in floating point operations. Quiet NaNs propagate 
through most operations without raising any exception. The result of 
these operations is the same quiet NaN. NaNs are sometimes pro- 
duced by the arithmetic operations themselves. For example, 0.0 
divided by 0.0 when the invalid operation trap is disabled produces a 
quiet NaN. 

The header file ieeefp.h defines the 3B Computer family interface for 
the floating point exception and environment control. This header defines 
three interfaces: 

■ Rounding Control 

■ Exception Control 

■ Exception Handling 



Rounding Control 

The floating point arithmetic provides four rounding modes which 
affect the result of most floating point operations. These modes are also 
defined in the header ieeefp.h: 



typedef enum fp_rnd 



FP_RN - 0, 

FP_RP - 1, 
FP_RM - 2, 
FP_RZ - 3 
fp_rnd; 



/* round to nearest representable 
number, tie -> even */ 
/• round toward plus infinity */ 
/* round toward minus infinity */ 
/* round toward zero (truncate) */ 
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You can check the current rounding mode with the function: 

fp rnd fpgetroundO; /* return current rounding mode */ 

You can change the rounding mode for floating point operations with the 
function: 

fp_rnd £psetround(round); /* set rounding mode, return previous */ 
fp rnd round; 

(For more information, see fpgetround(3C) in the Programmer's Reference 
Manual,) 

The default rounding mode in the 3B Computer family is round-to- 
nearest. In C and FORTRAN (F77), floating point to integer conversions are 
always done by truncation, and the current rounding mode has no effect on 
these operations. 

Exceptions, Sticky Bit, and Trap Bits 

Floating point operations can lead to certain exceptional conditions: 
divide by zero is a common example. There are five types of floating point 
exceptions: 

■ Divide by zero exception 

This exception happens when a non-zero number is divided by float- 
ing point zero. 

■ Invalid operation exception 

All operations on signaling NaNs raise an invalid operation excep- 
tion. Zero divided by zero, infinity subtracted from infinity, infinity 
divided by infinity all raise this exception. When a quiet NaN is 
compared with the greater or lesser predicates, an invalid exception is 
raised. 

■ Overflow exception 

This exception occurs when the result of any floating point operation 
is too large in magnitude to fit in the intended destination. 

■ Underflow exception ix underflow exception 

When the underflow trap is enabled, underflow exception is signaled 
when the result of some operation is a tiny non-zero number smaller 
than the smallest representable number in that format. When the 
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NOTE 



underflow trap is disabled, an underflow exception occurs only when 
both tininess and loss of accuracy are detected. 

Inexact or imprecise exception 

This exception is signaled if the rounded result of an operation is not 
identical to the infinitely precise result. Inexact exceptions are quite 
common. 1,0 / 3.0 is an inexact operation. Inexact exceptions also 
occur when the operation overflows without an overflow trap. 

The above examples do not constitute an exhaustive list of the conditions 
when an exception can occur. 



There is a sticky bit associated with each of these exceptions. Whenever 
any of these exceptions occur, the corresponding sticky bit is set (=1). The 
sticky bits are all cleared at the start of a process. After that, they are never 
cleared by the floating point system, but are set to remember that an excep- 
tion occurred. 

You can check the status of the sticky bits by using the function: ) 
fp except fpgetstickyO; /♦ return logged exceptions ♦/ 
£p_except can have combinations of the following values: 
#define fp except int 

#de£ine FP X INV 0x10 /* invalid operation exception ♦/ 

#define FP_X_OFL 0x08 /* overflow exception •/ 

#define FP_X_UFL 0x04 /♦ underflow exception •/ 

#define FP_X_DZ 0x02 /» divide-by-zero exception */ 

#define FP_X_IMP 0x01 /♦ imprecise (loss of precision)*/ 

You can change the sticky bits by using the function: 

fp except fpsetsticky(sticky); /* set logged exceptions, return previous ♦/ 
fp_except sticky; 

There is also a trap-enable bit (mask bit) associated with each exception. ^-sssx 
When an exception occurs, if the corresponding trap bit is enabled (=1), a ) 
trap takes place. These traps are precise traps; that is, the result of the 
operation is not written when the trap takes place. You can check the status 
of these mask bits by using the function: 
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fpjexcept fpgetmaskO; /♦ current exception mask •/ 

You can also selectively enable or disable any of the exceptions by cal- 
ling the function: 

fp except fpsetinask(mask); /« set mask, return previous *l 
fp_except mask; 

with appropriate mask values. 



NOTE 



You should make sure that the sticky bit corresponding to the excep- 
tion being enabled is cleared. 



In this compilation system, all the exceptions are masked by default. 
This behavior is different from earlier issues of the C Programming 
Language Utilities. This change was made to conform to the IEEE require- 
ments. 

Floating Point Exception Handling 

When the trap is enabled, floating point exceptions are signaled through 
the standard UNIX system signaling mechanism: that is, a SIGFPE is sent to 
the user process. A user who intends to handle the trap and proceed with 
the program must include the file ieeefp.h in at least one module of the 
program. The user can attach a handler to SIGFPE by calling the signal(2) 
routine. 

When a floating point exception handler is entered, two global variables 
are established: 

_fpftype floating point fault type 

fpftype identifies the primary exception type. Possible 
values for Jpftype are FP_UFLW, FP_DIVZ, INT_DIVZ etc. 
(See the header file ieeefp.h.) 

_fp£ault pointer to floating point exception structure 

_fp£ault points to a structure which provides all other infor- 
mation about the floating point operation. The information 
_fp_fault points to includes the type of operation being per- 
formed, the types and values of the operands, the type of a 
trapped value (if any), and the desired type of the result: 
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struct fp_fault ( 
fp_op operation; 
fp_dunion operand(2]; 
fp_dunion t^value; 
fpjdunion result; 

extern struct £p_fault • _fpfault; 



The operation field identifies the floating point operation that raised the 
exception. The possible values are included in ieeefp.h. fp_dunion is a 
discriminated union that contains information about the type and format of 
the operands (or result) (for example, whether the operand is in single- 
precision or double-precision). It also contains the actual values. See 
ieeefp.h for exact definitions of fp_op and £p_dunion. 

A user handler has the information about the floating point operation, 
the operands, the computed result, and the format in which the result is to 
be returned. The user handler can supply a result (by assignment to 
_fpfault-> result) in the right format, and when the handler returns, this 
result is used to complete the floating point operation. If no result is 
assigned by the user handler, a default result of 0.0 is used. 



Conversion Between Binary and Decimal Values 

There are four functions in the C library which allow the programmer 
to convert binary values to binary coded decimal (BCD) values, and vice 
versa. These functions are _s2dec, _d2dec, _dec2s, and _dec2d. They are 
described in this chapter and on the decconv(3C) manual page in the 
Programmer's Reference Manual. 
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The _s2dec function returns a decimal floating point value, given a 
pointer to a single-precision binary floating point number and a precision 
specification. 

void _s2dec (x, d, p) 
float *x; 
decimal *d; 
int p; 

On input, the value of the ilen field in the decimal should be set to tell 
how many decimal digits should be output in the mantissa for rounding 
purposes. If the ilen field is not in the range 1 < ilen < 9, a NaN is 
returned. If the input binary value x is a NaN or infinity, the returned 
decimal d will be a NaN or infinity with the appropriate sign. The 
exponential component of the returned decimal value is always two digits. 

The parameter p (0 < p < ilen) specifies how many of the digits in the 
output decimal mantissa string are to be considered to be to the right of the 
implicit decimal point. If p is out of range, a NaN is returned. 

The _d2dec function works like the _s2dec function except that it takes 
a pointer to a double-precision value for x. The ilen field must be in the 
range of 1 < ilen < 17, and the exponential component of the returned 
decimal will contain three digits. 

void _d2dec (x, d, p) 
double *x; 
decimal *d; 
int p; 

The _dec2s function returns a single-precision binary floating point 
value, given a decimal value, value and a precision specification. 

void _dec2s (x, d, p) 
decimal *d; 
float *x; 
int p; 

The parameter p (0 < p < ilen) tells how many of the digits in the 
mantissa string are to be considered to be to the right of an implicit decimal 
point. 
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Because the decimal format can represent a larger range of numbers 
than the binary formats, this conversion may overflow or underflow. Upon 
overflow or underflow, a signed infinity (signed zero) is returned, and the 
appropriate sticky bit is set. 

The mantissa and exponent strings may contain leading zero characters. 
But, once all leading characters are removed, the mantissa string should 
have a length >0 and <9. The exponent string should have a length >0 
and <2. The special case of d == (decimal) is detected, in which case the 
last characters in the string are not removed. 

The _dec2d function is analogous to the _dec2s function except that it 
returns a double-precision value. After leading zero characters are removed, 
the mantissa string should contain no more than 17 digits and the exponent 
string should contain no more than three digits. 

void _dec2d (x, d, p) 
decimal '^d; 
double '•'x; 
Int p; 

The conversion library functions use the round control, mask, and 
sticky bits just like any other floating point operation. Rounding is per- 
formed according to the current rounding mode. The default mode is 
round-to-nearest. 

The conversion functions will set the following sticky bits, if appropri- 
ate: 

■ overflow 

■ underflow 

■ inexact result 

■ invalid operation 

If a trap occurs, the usual trap handling conventions are used. Thus, a trap 
handler which the user may have attached via signal(2) will also catch 
exceptions encountered during conversions between binary and decimal 
values. When a trap occurs, the following happens: 

■ the global variable J pf type will be set to FP_CONV 
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n the global variable fpfault will point to the floating point exception 
structure 

■ the user's trap handler will be called 

If the conversion was to decimal, the source operand will be either single- 
or double-precision and the intermediate result (the t_value field) will be 
decimal. 

If the conversion was from decimal, the source operand will be decimal 
and the result type will indicate the size of the expected result (i.e., single 
or double). The t_value field will be the same size as the result size unless 
an overflow or underflow occurred. In the case of an overflow or an 
underflow, an extended precision value will be returned with the exponent 
adjusted by 192 for the single-precision case or 1536 for the double- 
precision case. 

If the trap handler does not supply a return value when a trap occurs, 
the default zero value will be returned. 



Single-Precision Fioating Point Operations 

According to The C Programming Language, all floating point operations 
are performed in double-precision arithmetic. This can have severe perfor- 
mance penalties on some programs that use single-precision float variables. 
For example, the statements 

float a,b; 
a b; 

lead to code that looks like 

convert b to double 

convert a to double 

add in double-precision 

convert to float and store result back in a 

The proposed ANSI standard for C has a provision that allows expres- 
sions to be evaluated in single-precision arithmetic if there is no double (or 
larger in size) operand in the expression. The C compiler supports an 
option, — F (given to cc) which enables float arithmetic to be done in single- 
precision. Given this option the previous statement can potentially be 
translated into a single instruction. 
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Floating point constants are double-precision, unless explicitly stated to 
be float. For example, in the statements 

float a,b; 
a = b + 1.0; 

because the constant 1.0 is of the size double-precision, b is promoted to 
double before the addition and the result is converted back to float, even if 
the -F option is specified. However, the constant can be explicitly cast to 
float: 

a = b + (float) 1*0; 

In that case, given the — F option, the statement can be potentially compiled 
to a single instruction. 

Using the — F option can result in loss of precision. Also, long int vari- 
ables (same as int on AT&T 3B Computers) have more precision than float 
variables. (They have a much smaller range). Consider the following exam- 
ple: 



int 

i = 0x7ff£fff; 
j - i • 1.0; 
printfC*! ° %x\n", j); 
j « i • (float) 1.0; 
printfC'j = %x\n", j); 



When compiled with -F , the first printf statement outputs 7ffff ff, while 
the second one prints 8000000. Without the — F option both statements 
print 7ff£fff . 

Also, function return values and arguments of type float /double are 
always in double-precision, so that if an expression contains functions and 
float variables, some unwanted type conversion may take place. 
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Whether a computation can be done in single-precision is decided at the 
lowest subexpression level. Consider the following: 

float s; 
double d; 

d = d + s * s; 

If the -F option is specified, s ♦ s is computed to produce a single- 
precision result, which is promoted to double-precision and added to d. 



NOTE 



The IEEE P854 task force which treats format independent floating 
point environment issues may disallow the multiplication to be carried 
in single-precision in this context; a future release of the C Program- 
ming Language Utilities may be modified to take that into account. 



Single-Precision Math Libraries 

The C Programming Language Utilities contain two single-precision 
math libraries. The first and most commonly useful, libsm, is a regular 
single-precision math library that contains the sin, cos, tan, asin, acos, atan, 
exp, log, loglO, pow, and sqrt functions. Most internal computations use 
single-precision arithmetic. 

You can improve run-time performance by using this single-precision 
math library whenever possible. However, it only contains a subset of the 
functions in libm, so you must be certain to search libm as well as libsm if you 
need it to resolve all references. When using this single-precision math 
library, you should use the -F option to cc (described above) as well. 

The other single-precision library, libsfm, is a special-purpose single- 
precision assembly source math library that contains the functions sin, cos, 
tan, asin, acos, atan, exp, log, loglO, pow, and sqrt. The routines in this 
source library are in-line expanded by the optimizer to provide faster execu- 
tion by reducing the overhead of argument passing, function calling and 
returning, and return value passing. The source library is designed for 
applications that desire an increase in speed at the potential cost of size. 



NOTE 



libsfm should be used only when necessary and with extreme caution. 
This library is a special purpose library which does not do domain 
reduction or error checking. In other words, these functions never call 
matherr, and arguments aren't reduced to be within a finite range. 
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Inputs to the sin and cos must be in the range: " ^ ^ ^ ^ "J" ' 

tan, the range is - y < x < y ; For sqrt, log, and loglO, inputs must be 
greater than 0. 

Implicit Precision of Subexpressions 

Even though the MAU provides double-extended-precision as quickly as 
double-precision arithmetic, the C Programming Language Utilities adhere 
to the C requirement of doing computations in double-precision to maintain 
a high degree of source level portability. 

Suppose two double-precision numbers are multiplied and the result is 
divided by a third. If intermediate results are stored in double-extended, it 
can hide a floating point overflow exception which would occur on 
machines not supporting double-extended. The proposed ANSI standard 
for C (X3J11/ 85-102) requires a new data type called long double, which 
would map to double-extended format on IEEE machines supporting it. 
The C Programming Language Utilities will be upgraded to include long 
doubles when the proposed ANSI standard is an accepted standard. 

IEEE Requirements 

All arithmetic computations generated by the C compiler strictly con- 
form to IEEE requirements. The following is a discussion of some topics 
where the C Programming Language Utilities falls short of completely 
meeting the ANSI/IEEE Std 754-1985 requirements or the spirit of the 
requirements. 

Remainder and Round to Integral 

Both these operations are available at the hardware level, but they do 
not have a direct mapping to C level source code. Floating point remainder 
is not defined in C, and the library function fmod(3M) is not the same as 
remainder. You can, however, access these operations through asms. 

Conversion of Floating Point Formats to Integer 

IEEE requires floating point to integer format conversions to be effected 
by current rounding mode. However, C language requires these conversions 
to be done by truncation (which is the same as round-to-zero). In the C Pro- 
gramming Language Utilities floating point to integer conversions are done 
by truncation. 
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Conversion of floating point numbers to integers should signal integer 
overflow or invalid operation for overflow condition. In the current imple- 
mentation the integer overflow flag is set, but there is no way to enable the 
overflow trap. Enabling the integer overflow trap would result in a sub- 
stantial performance penalty due to stalled pipeline effects. 

Square Root 

IEEE requires the square root of a negative non-zero number to raise 
invalid operation; whereas UNIX system compatibility requires square root 
to generate a matherrO exception with errno set to EDOM. The sqrt(3F) 
routine in the C Programming Language Utilities generates EDOM error for 
negative non-zero inputs. Other than that, the operation conforms to IEEE 
requirements. However, if there is no MAU present, the square root library 
may not return correctly rounded results. 

Compares and Unordered Condition 

In addition to the usual relationships between floating point values (less 
than, equal, greater than), there is a fourth relationship: unordered. The 
unordered case arises when at least one operand is a NaN. Every NaN com- 
pares unordered with everything, including itself. 

The C Programming Language Utilities provides the following predi- 
cates required by IEEE between floating point operands: 

!= < 
> <= 



While there is no predicate to test for unordered, there are the 
routines/macros isnan(3C) which return TRUE if the argument is an NaN. 

The relations >,>=,<, and <= raise invalid operation for unordered 
operands. The compiler generated code does not guard against the unor- 
dered outcome of a comparison. If the trap is masked, for unordered condi- 
tions the path taken is the same as if the conditional is true. This is the 
wrong path. 

For the predicates == and !=, unordered condition does not lead to 
invalid operation. The path taken for unordered condition is the same as 
when the operands are non-equal. This is the right path. 
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(a > b) is not the same as ( !(a <= b) ) in IEEE. The difference occurs 
when b or a compares unordered. The C compiler generates the same code 
for both cases. 

NaNs and Infinities in Input/Output 

Floating point numbers which are NaNs or infinities are printed as 
such. NaNs are printed with their diagnostic values. 

Ideally, whatever printf outputs, scanf should be able to read using the 
same format. In this release, scanf does not recognize NaNs and infinities 
for floating point formats. However, since these special cases serve mostly as 
diagnostics for erroneous floating point computation, outputting these cases 
was considered more important than being able to read them. 

Conversion to and from Decimal 

While IEEE requires functions for converting to and from decimal, it 
does not specify the format of decimal numbers. The routines on the 
decconv(3C) manual page provide a form of binary coded decimal (BCD) 
number as found in many commercial floating point processors. For C pro- 
grammers, the printf(3C) and scanf(3C) routines are probably more 
relevant. The accuracy of conversion in these routines meets the IEEE 
requirements. However, these routines always work in the round-to-nearest 
modes. The current rounding mode has no effect on them. 
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Named after the Countess of Lovelace, the 
nineteenth century mathematician and computer 
pioneer, Ada is a high-level general-purpose pro- 
gramming language developed under the sponsor- 
ship of the U.S. Department of Defense. Ada was 
developed to provide consistency among programs 
originating in different branches of the military. 
Ada features include packages that make data objects 
visible only to the modules that need them, task 
objects that facilitate parallel processing, and an 
exception handling mechanism that encourages 
well-structured error processing. 

ANSI standard ANSI is the acronym for the American National 

Standards Institute. ANSI establishes guidelines in 
the computing industry, from the definition of 
ASCII to the determination of overall datacom sys- 
tem performance. ANSI standards have been esta- 
blished for both the Ada and FORTRAN program- 
ming languages, and a standard for C has been pro- 
posed. 

a.out file a.out is the default file name used by the link editor 

when it outputs a successfully compiled, executable 
file, a.out contains object files that are combined to 
create a complete working program. Object file for- 
mat is described in Chapter 11, "The Common Object 
File Format," and in a.out(4) in the Programmer $ 
Reference Manual, 

application program An application program is a working program in a 

system. Such programs are usually unique to one 
type of users' work, although some application pro- 
grams can be used in a variety of business situations. 
An accounting application, for example, may well be 
applicable to many different businesses. 

archive An archive file or archive library is a collection of 

data gathered from several files. Each of the files 
within an archive is called a member. The command 
ar(l) collects data for use as a library. 



Glossary 

Ada 
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argument An argument is additional information that is passed 

to a command or a function. On a command line, an 
argument is a character string or number that fol- 
lows the command name and is separated from it by 
a space. There are two types of command-line argu- 
ments: options and operands. Options are immedi- 
ately preceded by a minus sign (— ) and change the 
execution or output of the command. Some options 
can themselves take arguments. Operands are pre- 
ceded by a space and specify files or directories that 
will be operated on by the command. For example, 
in the command 

pr -t -h Heading file 

all of the elements after the pr are arguments. — t 
and — h are options. Heading is an argument to the 
— h option, and file is an operand. 

For a function, arguments are enclosed within a pair 
of parentheses immediately following the function 
name. The number of arguments can be zero or 
more; if more than two are present they are 
separated by commas and the whole list enclosed by 
the parentheses. The formal definition of a function, 
such as might be found on a page in Section 3 of the 
Programmer's Reference Manual, describes the number 
and data type of argument(s) expected by the func- 
tion. 

ASCII is an acronym for American Standard Code 
for Information Interchange, a standard for data 
representation that is followed in the UNIX system. 
ASCII code represents alphanumeric characters as 
binary numbers. The code includes 128 upper- and 
lower-case letters, numerals, and special characters. 
Each alphanumeric and special character has an 
ASCII code (binary) equivalent that is one byte long. 

The assembler is a translating program that accepts 
instructions written in the assembly language of the 
computer and translates them into the binary 
representation of machine instructions. In many 
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cases, the assembly language instructions map 1 to 1 
with the binary machine instructions. 

assembly language A programming language that uses the instruction 

set that applies to a particular computer. 

BASIC BASIC is a high-level conversational programming 

language that allows a computer to be used much 
like a complex electronic calculating machine. The 
name is an acronym for Beginner's All-purpose Sym- 
bolic Instruction Code, 

branch table A branch table is an implementation technique for 

fixing the addresses of text symbols, without forfeit- 
ing the ability to update code. Instead of being 
directly associated with function code, text symbols 
label jump instructions that transfer control to the 
real code. Branch table addresses do not change, 
even when one changes the code of a routine. Jump 
table is another name for branch table, 

buffer A buffer is a storage space in computer memory 

where data are stored temporarily into convenient 
units for system operations. Buffers are often used 
by programs, such as editors, that access and alter 
text or data frequently. When you edit a file, a copy 
of its contents are read into a buffer where you make 
changes to the text. For the changes to become part 
of the permanent file, you must write the buffer con- 
tents back into the permanent file. This replaces the 
contents of the file with the contents of the buffer. 
When you quit the editor, the contents of the buffer 
are flushed. 

A byte is a unit of storage in the computer. On 
many UNIX systems, a byte is eight bits (binary 
digits), the equivalent of one character of text. 

Byte order refers to the order in which data are 
stored in computer memory. 

The C programming language is a general-purpose 
programming language that features economy of 
expression, control flow, data structures, and a 
variety of operators. It can be used to perform both 



byte 

byte order 
C 
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high-level and low-level tasks. Although it has been 
called a system programming language, because it is 
useful for writing operating systems, it has been 
used equally effectively to write major numerical, 
text-processing, and data base programs. The C pro- 
gramming language was designed for and imple- 
mented on the UNIX system; however, the language 
is not limited to any one operating system or 
machine. 

C compiler The C compiler converts C programs into assembly 

language programs that are eventually translated 
into object files by the assembler. 

C preprocessor The C preprocessor is a component of the C Compi- 

lation System. In C source code, statements pre- 
ceded with a pound sign (#) are directives to the 
preprocessor. Command line options of the cc(l) 
command may also be used to control the actions of 
the preprocessor. The main work of the preproces- 
sor is to perform file inclusions and macro substitu- 
tion. 

CCS CCS is an acronym for C Compilation System, which 

is a set of programming language utilities used to 
produce object code from C source code. The major 
components of a C Compilation System are a C 
preprocessor, C compiler, assembler, and link editor. 
The C preprocessor accepts C source code as input, 
performs any preprocessing required, then passes 
the processed code to the C compiler, which pro- 
duces assembly language code that it passes to the 
assembler. The assembler in turn produces object 
code that can be linked to other object files by the 
link editor. The object files produced are in the 
Common Object File Format (COFF). Other com- 
ponents of CCS include a symbolic debugger, an 
optimizer that makes the code produced as efficient 
as possible, productivity tools, tools used to read and 
manipulate object files, and libraries that provide 
runtime support, access to system calls, 
input/output, string manipulation, mathematical 
functions, and other code processing functions. 



G-4 PROGRAMMER'S GUIDE 



Glossary 



COBOL is an acronym for COmmon Business 
Oriented Language. COBOL is a high-level pro- 
gramming language designed for business and com- 
mercial applications. The English-language state- 
ments of COBOL provide a relatively machine- 
independent method of expressing a business- 
oriented problem to the computer. 

COFF is an acronym for Common Object File For- 
mat. COFF refers to the format of the output file 
produced on some UNIX systems by the assembler 
and the link editor. This format is also used by 
other operating systems. The following are some of 
its key features: 

□ Applications may add system-dependent infor- 
mation to the object file without causing access 
utilities to become obsolete. 

□ Space is provided for symbolic information used 
by debuggers and other applications. 

□ Users may make some modifications in the 
object file construction at compile time, 

command A command is the term commonly used to refer to 

an instruction that a user types at a computer termi- 
nal keyboard. It can be the name of a file that con- 
tains an executable program or a shell script that can 
be processed or executed by the computer on 
request. A command is composed of a word or 
string of letters and /or special characters that can 
continue for several (terminal) lines, up to 256 char- 
acters. A command name is sometimes used inter- 
changeably with a program name. 

command line A command line is composed of the command name 

followed by any argument(s) required by the com- 
mand or optionally included by the user. The 
manual page for a command includes a command 
line synopsis in a notation designed to show the 
correct way to type in a command, with or without 
options and arguments. 
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A compiler transforms the high-level language 
instructions in a program (the source code) into 
object code or assembly language. Assembly 
language code may then be passed to the assembler 
for further translation into machine instructions. 

Core is a (mostly archaic) synonym for primary 
memory. 

A core file is an image of a terminated process saved 
for debugging. A core file is created under the 
name "core" in the current directory of the process 
when an abnormal event occurs resulting in the pro- 
cess' termination. A list of these events is found in 
the signal(2) manual page in section 2 of the 
Programmer's Reference Manual. 

Core image is a copy of all the segments of a run- 
ning or terminated program. The copy may exist in 
main storage, in the swap area, or in a core file. 

curses(3X) is a library of C routines that are 
designed to handle input, output, and other opera- 
tions in screen management programs. The name 
curses comes from the cursor optimization that the 
routines provide. When a screen management pro- 
gram is run, cursor optimization minimizes the 
amount of time a cursor has to move about a screen 
to update its contents. The program refers to the 
terminfo(4) data base at run time to obtain the infor- 
mation that it needs about the screen (terminal) 
being used. See terminfo(4) in the Programmer's 
Reference Manual. 

A data symbol names a variable that may or may not 
be initialized. Normally, these variables reside in 
read /write memory during execution. See text sym- 
bol. 

A data base is a bank of information on a particular 
subject or subjects. On-line data bases are designed 
so that by using subject headings, key words, or key 
phrases you can search for, analyze, update, and 
print out data. 
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Debugging is the process of locating and correcting 
errors in computer programs. 

A default is the way a computer will perform a task 
in the absence of other instructions. 

A delimiter is an initial character that identifies the 
next character or character string as a particular kind 
of argument. Delimiters are typically used for 
option names on a command line; they identify the 
associated word as an option (or as a string of 
several options if the options are bundled). In the 
UNIX system command syntax, a minus sign (—) is 
most often the delimiter for option names, for exam- 
ple, -s or -n, although some commands also use a 
plus sign (+). 

A directory is a type of file used to group and organ- 
ize other files or directories. A directory consists of 
entries that specify further files (including direc- 
tories) and constitutes a node of the file system. A 
subdirectory is a directory that is pointed to by a 
directory one level above it in the file system organi- 
zation. 

The ls(l) command is used to list the contents of a 
directory. When you first log onto the system, you 
are in your home directory ($HOME). You can 
move to another directory by using the cd(l) com- 
mand and you can print the name of the current 
directory by using the pwd(l) command. You can 
also create new directories with the mkdir(l) com- 
mand and remove empty directories with rmdir(l). 

A directory name is a string of characters that 
identifies a directory. It can be a simple directory 
name, the relative path name or the full path name 
of a directory. 

dynamic linking Dynamic linking refers to the ability to resolve sym- 

bolic references at run time. Systems that use 
dynamic linking can execute processes without 
resolving unused references. See static linking. 
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environment An environment is a collection of resources used to 

support a function. In the UNIX system, the shell 
environment is composed of variables whose values 
define the way you interact with the system. For 
example, your environment includes your shell 
prompt string, specifics for backspace and erase char- 
acters, and commands for sending output from your 
terminal to the computer. 

An environment variable is a shell variable such as 
$HOME (which stands for your login directory) or 
$PATH (which is a list of directories the shell will 
search through for executable commands) that is part 
of your environment. When you log in, the system 
executes programs that create most of the environ- 
mental variables that you need for the commands to 
work. These variables come from /etc/profile, a file 
that defines a general working environment for all 
users when they log onto a system. In addition, you 
can define and set variables in your personal .profile 
file, which you create in your login directory to 
tailor your own working environment. You can also 
temporarily set variables at the shell level. 

executable file An executable file is a file that can be processed or 

executed by the computer without any further trans- 
lation. That is, when you type in the file name, the 
commands in the file are executed. An object file 
that is ready to run (ready to be copied into the 
address space of a process to run as the code of that 
process) is an executable file. Files containing shell 
commands are also executable. A file may be given 
execute permission by using the chinod(l) command. 
In addition to being ready to run, a file in the UNIX 
system needs to have execute permission. 

exit A specific system call that causes the termination of 

a process. The exit(2) call will close any open files 
and clean up most other information and memory 
which was used by the process. 



G-8 PROGRAMMER'S GUIDE 



Glossary 



exit status: return code 



exported symbol 
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file 



file and record locking ' 



file descriptor 



An exit status or return code is a code number 
returned to the shell when a command is terminated 
that indicates the cause of termination. 

A symbol that a shared library defines and makes 
available outside the library. See imported symbol. 

An expression is a mathematical or logical symbol or 
meaningful combination of symbols. See regular 
expression. 

A file is an identifiable collection of information 
that, in the UNIX system, is a member of a file sys- 
tem. A file is known to the UNIX system as an 
inode plus the information the inode contains that 
tells whether the file is a plain file, a special file, or 
a directory. A plain file may contain text, data, pro- 
grams or other information that forms a coherent 
unit. A special file is a hardware device or portion 
thereof, such as a disk partition. A directory is a 
type of file that contains the names and inode 
addresses of other plain, special or directory files. 

The phrase "file and record locking" refers to 
software that protects records in a data file against 
the possibility of being changed by two users at the 
same time. Records (or the entire file) may be 
locked by one authorized user while changes are 
made. Other users are thus prevented from working 
with the same record until the changes are com- 
pleted. 

A file descriptor is a number assigned by the operat- 
ing system to a file when the file is opened by a pro- 
cess. File descriptors 0, 1, and 2 are reserved; file 
descriptor is reserved for standard input (stdin), 1 
is reserved for standard output (stdout), and 2 is 
reserved for standard error output (stderr). 
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file system A UNIX file system is a hierarchical collection of 

directories and other files that are organized in a 
tree structure. The base of the structure is the root 
(/) directory; other directories, all subordinate to the 
root, are branches. The collection of files can be 
mounted on a block special file. Each file of a file 
system appears exactly once in the inode list of the 
file system and is accessible via a single, unique path 
from the root directory of the file system. 

filter A filter is a program that reads information from 

standard input, acts on it in some way, and sends its 
results to standard output. It is called a filter 
because it can be used as a data transformer in a 
pipeline. Filters are different from editors and other 
commands because filters do not change the contents 
of a file. Examples of filters are grep(l) and tail(l), 
which select and output part of the input; sort(l), 
which sorts the input; and wc(l), which counts the 
number of words, characters, and lines in the input. 
sed(l) and awk(l) are also filters but they are called 
programmable filters or data transformers because a 
program must be supplied as input in addition to 
the data to be transformed. 

flag A flag or option is used on a command line to signal 

a specific condition to a command or to request par- 
ticular processing. UNIX system flags are usually 
indicated by a leading hyphen (— ). The word option 
is sometimes used interchangeably with flag. Flag is 
also used as a verb to mean to point out or to draw 
attention to. See option. 

fork f ork(2) is a system call that divides a new process 

into two, the parent and child processes, with 
separate, but initially identical, text, data, and stack 
segments. After the duplication, the child (created) 
process is given a return code of and the parent is 
given the process id of the newly created child as 
the return code. 
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FORTRAN FORTRAN is an acronym for FORmula TRANslator. 

FORTRAN is a high-level programming language 
originally designed for scientific and engineering 
calculations but now also widely adapted for many 
business uses. 

function A function is a task done by a computer. In most 

modern programming languages, programs are made 
up of functions and procedures which perform small 
parts of the total job to be done. 

header file A header file is used in programming and in docu- 

ment formatting. In a programming context, a 
header file is a file that usually contains shared data 
declarations that are to be copied into source pro- 
grams as they are compiled. A header file includes 
symbolic names for constants, macro definitions, 
external variable references and inclusion of other 
header files. The name of a header file customarily 
ends with '.h' (dot-h). Similarly, in a document for- 
matting context, header files contain general format- 
ting macros that describe a common document type 
and can be used with many different document 
bodies. 

high-level language A high-level language is a computer programming 

language such as C, FORTRAN, COBOL, or PASCAL 
that uses symbols and command statements 
representing actions the computer is to perform, the 
exact steps for a machine to follow. A high-level 
language must be translated into machine language 
by a compilation system before a computer can exe- 
cute it. A characteristic of a high-level language is 
that each statement usually translates into a series of 
machine language instructions. The low-level 
details of the computer's internal organization are 
left to the compilation system. 

host machine A host machine is the machine on which an a.out 

file is built. 
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imported symbol A symbol used but not defined by a shared library. 

See exported symbol. 

interpreted language An interpreted language is a high-level language 

that is not translated by a compilation system and 
stored in an executable object file. The statements of 
a program in an interpreted language are translated 
each time the program is executed. 

Interprocess Communication 

Interprocess Communication describes software that 
enables independent processes running at the same 
time, to exchange information through messages, 
semaphores, or shared memory. 

interrupt An interrupt is a break in the normal flow of a sys- 

tem or program. Interrupts are initiated by signals 
that are generated by a hardware condition or a peri- 
pheral device indicating that a certain event has 
happened. When the interrupt is recognized by the 
hardware, an interrupt handling routine is executed. 
An interrupt character is a character (normally 
ASCII) that, when typed on a terminal, causes an 
interrupt. You can usually interrupt UNIX programs 
by pressing the delete or break keys, by typing 
Control-d, or by using the kill(l) command. 

I/O (Input /Output) I/O is the process by which information enters 

(input) and leaves (output) the computer system. 

kernel The kernel (comprising 5 to 10 percent of the operat- 

ing system software) is the basic n jident software 
on which the UNIX system relies. It is responsible 
for most operating system functions. It schedules 
and manages the work done by the computer and 
maintains the file system. The kernel has its own 
text, data, and stack areas. 

lexical analysis Lexical analysis is the process by which a stream of 

characters (often comprising a source program) is 
subdivided into its elementary words and symbols 
(called tokens). The tokens include the reserved 
words of the language, its identifiers and constants, 
and special symbols such as =, and ;, Lexical 
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analysis enables you to recognize, for example, that 
the stream of characters 'print("hello, universe")' is to 
be analyzed into a series of tokens beginning with 
the word 'print' and not with, say, the string 
'print("h/ In compilers, a lexical analyzer is often 
called by the compiler's syntactic analyzer or parser, 
which determines the statements of the program 
(that is, the proper arrangements of its tokens). 

library A library is an archive file that contains object code 

and /or files for programs that perform common 
tasks. The library provides a common source for 
object code, thus saving space by providing one copy 
of the code instead of requiring every program that 
wants to incorporate the functions in the code to 
have its own copy. The link editor may select func- 
tions and data as needed. 

link editor A link editor, or loader, collects and merges 

separately compiled object files by linking together 
object files and the libraries that are referenced into 
executable load modules. The result is an a.out file. 
Link editing may be done automatically when you 
use the compilation system to process your programs 
on the UNIX system, but you can also link edit pre- 
viously compiled files by using the ld(l) command. 

magic number The magic number is contained in the header of an 

a.out file. It indicates what the type of the file is, 
whether shared or non-shared text, and on which 
processor the file is executable. 

makefile A makefile is a file that lists dependencies among 

the source code files of a software product and 
methods for updating them, usually by recompila- 
tion. The make(l) command uses the makefile to 
maintain self-consistent software. 

manual page A manual page, or "man page" in UNIX system jar- 

gon, is the repository for the detailed description of 
a command, a system call, subroutine or other UNIX 
system component. 
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null pointer A null pointer is a C pointer with a value of 0. 

object code Object code is executable machine-language code 

produced from source code or from other object files 
by an assembler or a compilation system. An object 
file is a file of object code and associated data. An 
object file that is ready to run is an executable file. 

optimizer An optimizer, an optional step in the compilation 

process, improves the efficiency of the assembly 
language code. The optimizer reduces the space 
used by and speeds the execution time of the code. 

option An option is an argument used in a command line to 

modify program output by modifying the execution 
of a command. An option is usually one character 
preceded by a hyphen (-), When you do not specify 
any options, the command will execute according to 
its default options. For example, in the command 
line 



Is —a —1 directory 

—a and —1 are the options that modify the ls(l) com- 
mand to list all directory entries, including entries 
whose names begin with a period (.), in the long 
format (including permissions, size, and date). 

parent process A parent process occurs when a process is split into 

two, a parent process and a child process, with 
separate, but initially identical text, data, and stack 
segments. 

parse To parse is to analyze a sentence in order to identify 

its components and to determine their grammatical 
relationship. In computer terminology the word has 
a similar meaning, but instead of sentences, program 
statements or commands are analyzed. 

PASCAL PASCAL is a multipurpose high-level programming 

language often used to teach programming. It is 
based on the ALGOL programming language and 
emphasizes structured programming. 
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path name A path name is a way of designating the exact loca- 

tion of a file in a file system. It is made up of a 
series of directory names that proceed down the 
hierarchical path of the file system. The directory 
names are separated by a slash character (/). The 
last name in the path is either a file or another direc- 
tory. If the path name begins with a slash, it is 
called a full path name; the initial slash means that 
the path begins at the root directory. 

A path name that does not begin with a slash is 
known as a relative path name, meaning relative to 
the present working directory. A relative path name 
may begin either with a directory name or with two 
dots followed by a slash (•,/). One that begins with 
a directory name indicates that the ultimate file or 
directory is below the present working directory in 
the hierarchy. One that begins with indicates 
that the path first proceeds up the hierarchy; ../is 
the parent of the present working directory. 

permissions Permissions are a means of defining a right to access 

a file or directory in the UNIX file system. Permis- 
sions are granted separately to you, the owner of the 
file or directory, your group, and all others. There 
are three basic permissions: 

□ Read permission (r) includes permission to cat, 
pg, Ip, and cp a file. 

□ Write permission (w) is the permission to 
change a file. 

□ Execute permission (x) is the permission to run 
an executable file. 

Permissions can be changed with the UNIX system 
chmod(l) command. 

pipe A pipe causes the output of one command to be used 

as the input for the next command so that the two 
run in sequence. You can do this by preceding each 
command after the first command with the pipe 
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symbol ( | ), which indicates that the output from the 
process on the left should be routed to the process 
on the right. For example, in the command 

who I wc -1, 

the output from the who(l) command, which lists 
the users who are logged on to the system, is used as 
input for the word-count command, wc(l), with the 
1 option. The result of this pipeline (succession of 
commands connected by pipes) is the number of 
people who are currently logged on to the system. 

portable Portability describes the degree of ease with which a 

program or a library can be moved or ported from 
one system to another. Portability is desirable 
because once a program is developed it is used on 
many systems. If the program writer must change 
the program in many different ways before it can be 
distributed to the other systems, time is wasted, and 
each modification increases the chances for an error. 

preprocessor Preprocessor is a generic name for a program that 

prepares an input file for another program. For 
example, neqn(l) and tbl(l) are preprocessors for 
nroff(l). grap(l) is a preprocessor for pic(l). cpp(l) 
is a preprocessor for the C compiler. 

process A process is a program that is at some stage of execu- 

tion. In the UNIX system, it also refers to the execu- 
tion of a computer environment, including contents 
of memory, register values, name of the current 
working directory, status of files, information 
recorded at login time, etc. Every time you type the 
name of a file that contains an executable program, 
you initiate a new process. Shell programs can cause 
the initiation of many processes because they can 
contain many command lines. 

The process id is a unique system-wide identification 
number that identifies an active process. The pro- 
cess status command, ps(l), prints the process ids of 
the processes that belong to you. 
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program 



regular expression 



routine 



semaphore 



shared library 
shared memory 

shell 



A program is a sequence of instructions or com- 
mands that cause the computer to perform a specific 
task, for example, changing text, making a calcula- 
tion, or reporting on the status of the system. A 
subprogram is part of a larger program and can be 
compiled independently. 

A regular expression is a string of alphanumeric 
characters and special characters that describe a char- 
acter string. It is a shorthand way of describing a 
pattern to be searched for in a file. The pattern- 
matching functions of ed(l) and grep(l), for exam- 
ple, use regular expressions. 

A routine is a discrete section of a program to 
accomplish a set of related tasks 

In the UNIX system, a semaphore is a sharable short 
unsigned integer maintained through a family of 
system calls which include calls for increasing the 
value of the semaphore, setting its value, and for 
blocking waiting for its value to reach some value. 
Semaphores are part of the UNIX system IPC facility. 

Shared libraries include object modules that may be 
shared among several processes at execution time. 

Shared memory is an IPC (interprocess communica- 
tion) facility in which two or more processes can 
share the same data space. 

The shell is the UNIX system program— sh(l)— 
responsible for handling all interaction between you 
and the system. It is a command language inter- 
preter that understands your commands and causes 
the computer to act on them. The shell also estab- 
lishes the environment at your terminal. A shell 
normally is started for you as part of the login pro- 
cess. Three shells, the Bourne shell, the Korn shell 
and the C shell, are popular. The shell can also be 
used as a programming language to write procedures 
for a variety of tasks. 
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source code 



signal: signal number 

A signal is a message that you send to processes or 
processes send to one another. The most common 
signals you might send to a process are ones that 
would cause the process to stop: for example, inter- 
rupt, quit, or kill, A signal sent by a running pro- 
cess is usually a sign of an an exceptional occurrence 
that has caused the process to terminate or divert 
from the normal flow of control. 

Source code is the programming-language version of 
a program. Before the computer can execute the 
program, the source code must be translated to 
machine language by a compilation system or an 
interpreter. 

Standard error is an output stream from a program. 
It is normally used to convey error messages. In the 
UNIX system, the default case is to associate stan- 
dard error with the user's terminal. 

Standard input is an input stream to a program. In 
the UNIX system, the default case is to associate 
standard input with the user's terminal. 

Standard output is an output stream from a program. 
In the UNIX system, the default case is to associate 
standard output with the user's terminal. 

stdio: standard input-output 

stdio(3S) is a collection of functions for formatted 
and character-by-character input-output at a higher 
level than the basic read, write, and open operations. 



standard error 



standard input 



standard output 



static linking 



stream 



Static linking refers to the requirement that symbolic 
references be resolved before run time. See dynamic 
linking. 

□ A stream is an open file with buffering pro- 
vided by the stdio package. 
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□ A stream is a full duplex, processing and data 
transfer path in the kernel. It implements a 
connection between a driver in kernel space 
and a process in user space, providing a general 
character input /output interface for the user 
processes. 

string A string is a contiguous sequence of characters 

treated as a unit. Strings are normally bounded by 
white space(s), tab(s), or a character designated as a 
separator. A string value is a specified group of 
characters symbolized to the shell by a variable. 

strip strip(l) is a command that removes the symbol table 

and relocation bits from an executable file. 

subroutine A subroutine is a program that defines desired 

operations and may be used in another program to 
produce the desired operations. A subroutine can be 
arranged so that control may be transferred to it 
from a master routine and so that, at the conclusion 
of the subroutine, control reverts to the master rou- 
tine. Such a subroutine is usually called a closed 
subroutine. A single routine may be simultaneously 
a subroutine with respect to another routine and a 
master routine with respect to a third. 

symbol table A symbol table describes information in an object 

file about the names and functions in that file. The 
symbol table and relocation bits are used by the link 
editor and by the debuggers. 

symbol value The value of a symbol, typically its virtual address, 

used to resolve references. 

syntax 

□ Command syntax is the order in which com- 
mand names, options, option arguments, and 
operands are put together to form a command 
on the command line. The command name is 
first, followed by options and operands. The 
order of the options and the operands varies 
from command to command. 
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system call 



□ Language syntax is the set of rules that describe 
how the elements of a programming language 
may legally be used. 

A system call is a request by an active process for a 
service performed by the UNIX system kernel, such 
as I/O, process creation, etc. AH system operations 
are allocated, initiated, monitored, manipulated, and 
terminated through system calls. System calls allow 
you to request the operating system to do some work 
that the program would not normally be able to do. 
For example, the getuid(2) system call allows you to 
inspect information that is not normally available 
since it resides in the operating system's address 
space. 

A target machine is the machine on which an a.out 
file is run. While it may be the same machine on 
which the a.out file was produced, the term implies 
that it may be a different machine. 

TCP/IP (Transmission Control Protocol /Internetwork Protocol) 

TCP/IP is a connection-oriented, end-to-end reliable 
protocol designed to fit into a layered hierarchy of 
protocols that support multi-network applications. It 
is the Department of Defense standard in packet net- 
works. 



target machine 



terminal definition 



terminfo 



A terminal definition is an entry in the terminfo(4) 
data base that describes the characteristics of a termi- 
nal. See terminfo(4) and curses(3X) in the 

Programmer's Reference Manual. 

□ a group of routines within the curses library 
that handle certain terminal capabilities. For 
example, if your terminal has programmable 
function keys, you can use these routines to 
program the keys. 

□ a data base containing the compiled descriptions 
of many terminals that can be used with 
curses(3X) screen management programs. These 
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descriptions specify the capabilities of a termi- 
nal and how it performs various operations — 
for example, how many lines and columns it 
has and how its control characters are inter- 
preted. A curses(3X) program refers to the data 
base at run time to obtain the information that 
it needs about the terminal being used. 

See curses(3X) in the Programmer's Reference Manual 
terminf o(4) routines can be used in shell programs, 
as well as C programs. 

text symbol A text symbol is a symbol, usually a function name, 

that is defined in the .text portion of an a.out file. 

tool A tool is a program, or package of programs, that 

performs a given task. 

trap A trap is a condition caused by an error where a pro- 

cess state transition occurs and a signal is sent to the 
currently running process. 

UNIX operating system 

The UNIX operating system is a general-purpose, 
multiuser, interactive, time-sharing operating system 
developed by AT&T. An operating system is the 
software on the computer under which all other 
software runs. The UNIX operating system has two 
basic parts: 



□ The kernel is the program that is responsible 
for most operating system functions. It 
schedules and manages all the work done by 
the computer and maintains the file system. It 
is always running and is invisible to users. 

□ The shell is the program responsible for han- 
dling all interaction between users and the com- 
puter. It includes a powerful command 
language called shell language. 
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userid 



utility 



variable 



white space 



window 



The utility programs or UNIX system commands are 
executed using the shell, and allow users to com- 
municate with each other, edit and manipulate files, 
and write and execute programs in several program- 
ming languages. 

A userid is an integer value, usually associated with 
a login name, used by the system to identify owners 
of files and directories. The userid of a process 
becomes the owner of files created by the process 
and descendent (forked) processes. 

A utility is a standard, permanently available pro- 
gram used to perform routine functions or to assist a 
programmer in the diagnosis of hardware and 
software errors, for example, a loader, editor, debug- 
ging, or diagnostics package. 

□ A variable in a computer program is an object 
whose value may change during the execution 
of the program, or from one execution to the 
next. 

□ A variable in the shell is a name representing a 
string of characters (a string value). 

□ A variable normally set only on a command line 
is called a parameter (positional parameter and 
keyword parameter). 

□ A variable may be simply a name to which the 
user (user-defined variable) or the shell itself 
may assign string values. 

White space is one or more spaces, tabs, or newline 
characters. White space is normally used to separate 
strings of characters, and is required to separate the 
command from its arguments on a command line. 

A window is a screen within your terminal screen 
that is set off from the rest of the screen. If you 
have two windows on your screen, they are 
independent of each other and the rest of the screen. 
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The most common way to create windows on a 
UNIX system is by using the layers capability of the 
TELETYPE 5620 Dot-Mapped Display. Each window 
you create with this program has a separate shell 
running it. Each one of these shells is called a layer. 

If you do not have this facility, the shl(l) command, 
which stands for shell layer, offers a function similar 
to the layers program. You cannot create windows 
using shl(l), but you can start different shells that 
are independent of each other. Each of the shells 
you create with shl(l) is called a layer. 

word A word is a unit of storage in a computer that is 

composed of bytes of information. The number of 
bytes in a word depends on the computer you are 
using. The AT&T 3B Computers, for example, have 
32 bits or 4 bytes per word, and 16 bits or 2 bytes 
per half word. 
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accept 6: 38 
access permissions 7: 4 
access routines 11: 43 
A_COLOR 10: 49 
add-books 3: 38 

addch 10: 20-21, 23-24, 39, 58, 76 
additive operators 17: 17 
addr 4: 49, 51 
address 

physical 11: 3, 16; 12: 2 

virtual 11: 3, 14 
addscr 3: 46 

addstr 10: 14, 20, 23-24, 38 
admin 14: 2-3, 26-29, 38, 40-41 
advisory locking 7: 2 
algorithm 12: 27 
ALIGN 12: 12 
align 12: 6 

a.out 2: 11, 67; 3: 21; 8: 2, 4, 9-12, 14, 25, 

32, 38, 40-41, 50, 58; 11: 7; 12: 23 
application programming 1: 8; 3: 1-2 
ar 2: 70; 12: 23 
archive 2: 70 

archive library 8: 2-3, 5-7, 9-10, 12, 28, 33; 
12: 23-24; 13: 11, 15 

compatibility 8: 41 
arg 9: 50, 52 
argc 2: 14, 17 
argument 2: 16; 16: 3-4 

command line 2: 14, 16; 4: 47 
argv 2: 14, 16-17 

arithmetic conversion 17: 10, 15-19 



arithmetic types 17: 7 

array 2: 14; 11: 38; 15: 5; 17: 53 

as 11: 1; 12: 17 

ASCII character-coded integer values 2: 23 

assembly language 2: 5, 8, 51; 12: 17 

assignment operator 16: 11; 17: 21 

assignment statement 12: 5-6 

atlroff 10: 42 

attron 10: 42, 47 

attrsel 10: 39-40, 42, 47, 100 

auto 17: 23-24 

auxiliary symbol table 11: 35-37, 40, 42 
awk 2: 6; 3: 6-7; 4: 1-31, 33-52, 54-64 

arrays 4: 33 

built-in variables 4: 20 

error messages 4: 1 1 

fields 4: 4 

functions 4: 9 

pattern ranges 4: 19 

patterns 4: 7 

regular expressions 4: 15 
statements 4: 30 
string functions 4: 23 
strings 4: 23 

user-defined functions 4: 36 

B 

BASIC 2: 5, 10 
.bb 11: 20 
be 2: 7 
beep 10: 54 
bessel 3: 27 
.bf 11: 22 
binding 12: 2 
bit masks 10: 40 
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block 11: 20, 39; 17: 37 

inner 11: 20 

nested 11: 21 
box 10: 70 

branch delta 14: 8-9, 16 

branch table 8: 12, 26, 30 

branch table specifications 8: 20 

break statement 16: 5-6; 17: 41 

breakpoint 11: 15; 15: 1, 8-12 

.bss 11: 12, 28; 12: 13-14, 16, 20-22, 26 

buf 9: 52 

bug 15: 11; 16: 4-5, 7 
BullsEye 10: 4, 6 

C 

C compiler 16: 11-12 

C compiler control lines 17: 47-48 

C language 2: 3, 7, 10, 13, 18, 51; 5: 1, 19 

can_change_colors 10: 49 

captoinfo 10: 8-9, 92 

cbreak 10: 55, 59, 104 

cc 2: 9-12, 54 

cdc 14: 33 

cflow 2: 51, 53-54; 8: 43 
char 16: 8; 17: 6, 9 
character 17: 9 
character constant 17: 3-4 
checksum 14: 40 
child process 2: 39; 7: 16 
chkshlib 8: 46-49 
chmod 7: 19; 14: 10 
chtype 10: 21, 39 
clear 10: 20, 28 
clearok 10: 28 
clearjscreen 10: 76 
close 4: 42 

clrtobot 10: 20, 29, 104 
clrtoeol 10: 20, 29, 31, 104 



cmd 9: 16, 49, 53 
COBOL 2: 4 

codes, operation permission 9: 9 
COFF. See Common Object File Format 11: 
1 

color 

HSL notation 10: 44 

initializing 10: 45 

RGB notation 10: 44, 49 
color manipulation 10: 44 
color_content 10; 49 
COLOR_PAIR 10: 47 
color-pair 10: 44 

initializing 10: 47 
color-pairs table 10: 46 
colors program 10: 49, 111 
colors table 10: 44 

default 10: 45 

redefining 10: 48 
comb 14: 35 
comma operator 17: 22 
command language 12: 1 

link editor 12: 4 
commands, executable 13: 9 
comment 17: 2 

Common Object File Format (COFF) 2: 8; 3: 
22; 11: 1, 3, 6; 12: 13 

symbol table 11: 18 
compiler diagnostics 2: 10 
compiling C programs 2: 9 
compound statement (block) 17: 37 
conditional operator 17: 21 
conditional statement 17: 38 
constant 17: 3-4, 13 

character 17: 3-4 

decimal 17: 3 

enumeration 17: 4 

floating 17: 4 

integer 17: 3 
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octal 17: 3 
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continue statement 16: 5; 17: 41 
control command 9: 10-11, 42, 50, 53-54, 75 
control-d 15: 12 
conversion 17: 9 

arithmetic 17: 10 

characters and integers 17: 9 

float and double 17: 9 

floating and integral 17: 9 

pointers and integers 17: 10 

unsigned integer 17: 10 
COPY section 12: 30 
cpp 2: 31 

cross-compiler 11: 3 
ctl 3: 19 

clrace 2: 54-55, 57 
ctype 2: 22 

curses 2: 7, 32; 3: 20; 10: 1-4, 6-8, 10-13, 

15-16, 19-20, 31, 56, 69, 71-72, 80, 93-94 
curses library 10: 5 
<curses.h> 10: 3-4, 11-13, 16, 21, 76 
cursor optimization 10: 3 
cxref 2: 58, 62 
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.data 12: 13-14, 16, 19, 26, 31 
data structure 9: 36-39 
data symbol 8: 28 
dc 2:7 

deadlock 7: 17, 20 
debug 15: 6, 13 
decimal constant 17: 3 
declaration 17: 23, 59 

auto 17: 23 

enumeration 17: 31 

implicit 17: 35 

register 17: 23 



static 17: 23 

structure 17: 28 

union 17: 28 
declarator 17: 25-26 
delta 14: 1-6, 16, 18, 21, 23-26, 33 
delta 14: 2, 7-9, 18 

branch 14: 8, 16 

numbering 14: 7 

trunk 14: 16 
dependency 13: 9 
derived type 11: 31 
description file 13: 8-10 
desk calculator 6: 45, 48 
diff 14: 23 
digit 17: 3-4 
do statement 17: 38 
double 16: 8; 17: 14 
doupdate 10: 12, 61-62, 72 
DSECT 12: 30 
dummy section 12: 30 
dump 8: 14, 46 

E 

.eb 11: 20 
echo 2: 42; 10: 58 
ed 14: 40; 15: 6 
editor 10: 93-94 
editor program 10: 93 
.ef 11: 22 

else statement 17: 38 
end-marker, yacc 6: 6 
endwin 10: 4, 11, 13-14, 26 
entry point 12: 23 
enumeration constant 17: 4 
enumeration declaration 17: 31 
environ 2: 17 
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environment variable 13: 23 
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envp 2: 17 

.eos 11: 28 

epsilon 6: 5 

equality operator 17: 19 

erase 10: 20, 28 

errno 2: 44; 9: 3, 17-18 

error 6: 28-30, 38 

error handling, yacc 6: 28 

error message 14: 11; 16: 5 

exec 2: 38-42 

execle 2: 39 

execlp 2: 39 

executable object module 2: 38 
execv 2: 39 
execve 2: 39 
execvp 2: 39 

exit 2: 12, 35; 12: 23; 16: 6 
expression 17: 11-13, 57 

primary 17: 12 
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extern 17: 24, 43, 46 
external data definition 17: 44 
external definition 17: 43, 63 
external function definition 17: 43 
external symbol 8: 28 
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fcntl 7: 6, 9, 14-17 
fcntLh 3: 17 
fgets 2: 27 
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field variable 4: 28 
file, description 13: 8 



file descriptor 2: 33 

file header 11: 4-7 

file inclusion 17: 48 

file locking 3: 14-15; 7: 1, 20 

file retrieval 14: 3 

flash 10: 54 

float 16: 8; 17: 14 

floating constant 17: 4 

fopen 2: 34-35 

for statement 16: 5; 17: 38 

fork 2: 38-39, 41-42 

format specifier 15: 4 

FORTRAN 2: 4, 7, 10; 5: 9 

f printf 2: 34 

function 2: 19-20; 11: 22, 37-39; 16: 4-6; 
17: 52 

bessel 3: 27 

conversion 2: 24 

hyperbolic 3: 28 

string handling 2: 20 

trigonometric 3: 27 
function call 15: 3, 11; 16: 12; 17: 14 
function name 2: 24 



get 3: 18; 14: 3-5, 13-18, 21-23, 25, 28, 33-34 

getc 2: 35; 10: 3 

getch 10: 3, 31-33, 35, 60, 76 

getchar 5: 10; 10: 31; 16: 9 

getline 4: 44-45, 47 

gets 2: 25, 27-28, 34 

getstr 10: 31, 35, 37, 60 

global data 8: 28-29, 56 

goto statement 16: 5; 17: 42 

graphics 10: 69 

gsub 4: 24 
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hardware and performance 8: 45 
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optional 11: 6, 8 

section 11: 9-12 
header file 2: 30 

<curses.h> 10: 11 
header files 10: 75-76 
help 14: 6, 31 
hexadecimal digit 17: 3 
highlight 10: 100 
highlight program 10: 100 
host shared library 8: 10, 22-23, 37, 40, 47, 
50 

I 

idlok 10: 104 

if statement 17: 38 

implicit declaration 17: 35 

imported symbol 8: 38 

inch 10: 40 

include 13: 20 

index 4: 25 

INFO section 12: 30 

infocmp 10: 8-9, 91 

.init 12: 13, 27 

init 10: 10 

init_color 10: 45, 48, 53 

initialization 14: 27; 16: 11; 17: 32-33 

init_pair 10: 46-47, 51 

initscr 10: 4, 6, 11-13, 26, 28, 71 

inner block 11: 20 

input 5: 10 

input files, non relocatable 12: 31 
insert line 10: 80 
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interface 2: 13 

Inter-Process Communication (IPC) 3: 18-19; 
9: 1 

interrupts 2: 44 
intro 2: 44 

I/O 2: 3, 33, 35; 5: 10, 15; 10: 39, 60, 69 
I/O subroutines 2: 19-20 
IPC. See Inter-Process Communication 9: 1 
ipc_perm 9: 6 

K 

key 9: 6-7, 9, 11-12, 38 
keyboard-entered capabilities 10: 87 
keyletters 14: 21 
keypad 10: 31 

keyword 14: 27, 29, 31; 17: 2 
kill 2: 44 

L 

labeled statement 17: 42 
labels 10: 71 

soft 10: 71 
languages 

assembly language 2: 5 

BASIC 2: 5 

C 2: 3 

COBOL 2: 4 
FORTRAN 2: 4 
Pascal 2: 4 

special purpose, awk 2: 6 
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special purpose, M4 2: 7 

special purpose, yacc 2: 7 
Ick 7: 14 
-Icurses 10: 3 
Icurses 2: 32; 10: 15 

Id 2: 11; 11: 1; 12: 1-4, 6-8, 11, 13, 16-20, 

23-24, 26-27, 30-32 
Idfcn 3: 26 
Idopen 3: 26 
length 4: 25; 12: 8 

lex 2: 7, 51; 3: 7-11; 5: 1-16, 18-21; 6: 12; 
16: 6 

actions 5: 6 

definitions 5: 12 

fundamentals 5: 3 
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specification 5: 1 

subroutines 5: 14 

tool 5: 1 

usage 5: 7 
lexical analysis 6: 10, 50 
lexical analyzer 5: 2, 5, 19; 6: 1-2, 4, 6-7, 

10-12, 36, 42, 45 
lexical conventions 17: 2 
lexical scope 17: 45 
.lib 8: 2 
libca 2: 18 
liber 3: 1, 37 
library 2: 30 

archive 8: 2; 12: 23; 13: 15 

curses 10: 1 

host 8: 22, 37 

math 3: 26 

networking 8: 3-4, 17-18 
object file 2: 32; 3: 23 
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target 8: 18, 22 
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line numbers (COFF) 11: 16-17 
link editor 2: 8-12, 17, 32, 38; 3: 21; 12: 1- 

14, 16-23, 26-36 
link editor command language 12: 1, 4 
lint 2: 63; 3: 31; 16: 1-12 
lint usage 16: 2 
local symbols 11: 20 
lock 7: 6, 8, 12, 14, 16-18 

read 7: 2, 4, 10 

record 7: 6, 10-11 

write 7: 2, 4, 10 
lockf 3: 17; 7: 6, 8-9, 11-12, 16-17 
locking 7: 1, 5 

advisory 7: 2, 19 

file and record 7: 1, 20 

mandatory 7: 3, 18-19 
long 16: 8-9; 17: 3 
longname 10: 10 
Is 7: 19; 14: 40 
Iseek 7: 9 

lvalue 17: 8, 14-16, 21 

M 

M4 2:7 

machine language 15: 11 

macros, make 13: 4, 9-11, 14, 16-17 

mail 7: 4 

main 2: 14; 5: 19; 6: 32; 8: 38; 15: 3 
make 2: 68-70; 3: 33-34; 13: 1-26 
makefile 3: 33; 13: 1-2, 10, 15-16, 19-21, 

23-24 
mallinfo 3: 13 
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mallopt 3: 13-14 
mandatory locking 7: 3, 18 
mark 12: 19 
match 4: 25 
math library 3: 26 
member 17: 30 

memory 3: 13; 8: 45; 9: 68; 12: 1, 7-8, 20, 
25-27 

memory configuration 12: 1, 7 

message 9: 2-4, 6-7, 22; 16: 4-12 

message queue 9: 4, 6, 15, 18 

message queue identifier (msqid) 9: 3, 18 

mkshlib 8: 1, 16, 19, 22, 40, 57 

mode 2: 34 

modification request 14: 24 

module 2: 38 

move 10: 14, 20, 26 

msgctl 9: 2, 7, 15-16, 18-19, 22, 26 

msgflg 9: 9 

msgget 9: 2, 6, 8-11, 13, 16 
msgop 3: 19; 9: 7, 22, 29, 33 
msgrcv 9: 22, 27-28 
msgsnd 9: 22-23, 27 
msqid 9: 6-9, 11-12 
multiplicative operators 17: 16 
mvaddch 10: 20, 26 
mvaddstr 10: 20, 26 
mvinch 10: 93 
mvprintw 10: 20 

N 

name 17: 2 

tag 11: 36 
name field 11: 24 
nested block 11: 21 
newwin 10: 66 
nm 2: 71-76 



nobreak 10: 55 
nocbreak 10: 58-59 
nodelay 10: 57 
noecho 10: 55, 58, 104 
NOLOAD section 12: 30 
noraw 10: 57 
null statement 17: 42 

O 

object 17: 8, 15, 17 

object file 2: 8; 3: 26; 8: 21; 11: 1-4, 7, 9, 
13, 15; 12: 1-3, 5 
relocatable 12: 3 
object file format 11: 2 
octal constant 17: 3 
octal value 9: 9, 42 
op 3: 19 
open file 7: 4 

operation permissions 9: 42, 75 
operator 

additive 17: 17 

assignment 17: 21 

comma 17: 22 

conditional 17: 21 

equality 17: 19 

multiplicative 17: 16 

relational 17: 18 
operator conversion 17: 9 
opterr 2: 53 
optional header 11: 6, 8 
origin 12: 8 
output 5: 10 

output file blocking 12: 31 
output section 11: 13 
outsec 12: 11, 17 
outsecl 12: 10, 20 
OVERLAY section 12: 31 
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pad 10: 19 
pads 10: 19, 60-61 
paging 8: 42-43 
pair_conteitt 10: 49 
PAIR_NUMBER 10: 49 
parameter 13: 20 

parameter string capabilities 10: 88 
parent process 2: 39; 7: 16 
parent process ID (PPID) 2: 37 
parse 2: 7; 6: 1, 13-15, 17, 19, 23, 28-29; 17: 
2 

passwd 14: 10 
pathname 2: 34 
pattern matching 15: 4 
pattern ranges, awk 4: 19 
patterns, awk 4: 7, 12, 18 
patterns awk 2: 6 
pclose 2: 42 
permissions 7: 4 
perror 2: 44 

physical address 11: 3, 16; 12: 2 

pid 2: 44 

pipe 2: 42; 4: 41 

pointer 2: 16; 15: 5; 16: 8, 12; 17: .10, 13- 

15, 17-19, 21, 53 
pointer alignment 16: 12 
pointer conversion 17: 54 
popen 2: 42 
popen pipe 2: 43 
portability 10: 48; 17: 56 
pr 14: 35 

preprocessor 2: 7, 31; 17: 64 
prgm 15: 2 
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printf 2: 34; 4: 39-40; 8: 4, 26; 10: 3, 24, 
76, 88 
awk 4: 7 
print w 10: 3, 20, 24 



process 2: 37-38; 7: 2; 9: 2 

child 7: 16 

parent 7: 16 
process ID (PID) 2: 37 
prof 2: 64 
.profile 10: 6-7, 15 
program 

colors 10: 49, 111 

editor 10: 93 

highlight 10: 100 

scatter 10: 102 

show 10: 104 

two 10: 106 

window 10: 109 
programming 

application 1: 8; 3: 1-2 

single-user 1: 7 
programming environment 1: 7 
programming language 2: 2-3 
prs 14: 29 
ps 2: 37 

putchar 5: 10; 10: 80 
putp 10: 76, 80 

R 

raw 10: 55 

read lock 7: 2, 4, 10, 15 

record 4: 44; 7: 2, 10, 12 

record locking 3: 14-15; 7: 4, 6, 10-11, 19-20 

recursive 6: 34-35 

reduce 6: 13-14 

refresh 10: 11, 13-16, 26, 28, 30, 33, 38, 100 

register declaration 17: 23-24 

registers, sdb 15: 12 

regular expression 15: 7 

relational operator 17: 18-19 

relocatable symbols 11: 27 

relocation 11: 13 
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reset 10: 10 
restate 2: 69 

return statement 16: 5; 17: 41 
rmdel 14: 32-33 

S 

sact 14: 31 

saving space 8: 6-7 

scanf 10: 37 

scanw 10: 31, 37-38 

scatter program 10: 102 

sees. See Source Code Control System 14: 1 

sees (Source Code Control System) 2: 77 

sccsdiff 14: 35 

sccsfile 14: 40 

schmctl 9: 82-83 

scope 17: 45-46 

screen management programs 10: 1 
screen-oriented capabilities 10: 86 
scrollok 10: 100 

sdb 2: 8, 11, 66-67; 3: 30-31; 6: 33; 8: 14; 

15: 1-13. 15 
section 11: 3, 10, 13, 36; 12: 2 

COPY 12: 29-30 

DSECT 12: 29 

INFO 12: 29-30 

NOLOAD 12: 29-30 

OVERLAY 12: 29, 31 
section definition directives 12: 9 
section header 12: 2 
section header (COFF) 11: 9-12 
section number (COFF) 11: 28-29 
semaphore 9: 34-43, 45, 48-49, 51, 53, 55, 60, 

62-64, 67 
semaphore set 9: 36-37, 39 
semaphore set identifier 9: 63 
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semflg 9: 41-42 

semget 9: 34, 38-40, 44-45, 48, 61 
semid 9: 39, 41, 44-45, 49 
semnum 9: 49 

semop 3: 19; 9: 34-35, 39, 60-61 
setjerm 10: 73 
setupterm 3: 20; 10: 6, 76 
shared library 3: 29; 8: 1-23, 25-33, 35-37, 
39-42, 44-46, 55-61 

branch table 8: 13 

building 8: 16, 59 

compatibility 8: 45 

contents 8: 19 

debugging 8: 14 

example 8: 50 

global data 8: 27 

guidelines 8: 24 

source files 8: 51 

space 8: 7, 13 
shared memory 9: 68-69, 71-73 

operations 9: 89 
shared memory identifier, shmid 9: 69 
shell 1: 5, 9; 2: 16, 37; 4: 49 
shift operator 17: 18 
shkshlib 8: 45 
shlib 8: 18 
shmat 9: 89 
shmcti 9: 81, 84, 88 
shmdt 9: 89-90 
shmget 9: 68, 72-74, 77-80 
shmid 9: 69, 90 
shmop 3: 19; 9: 89-90, 92, 95 
short 16: 8 

show program 10: 104 

signals 2: 44, 46 

single process program 10: 72 

single-user programmer 1: 7; 2: 77 

size 2: 66 
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slk_attron 10: 72 
slk_attrset 10: 72 
slk clear 10: 72 
slkjnil 10: 71 
slkjnt 10: 71 
slk_noutrefresh 10: 71-72 
slk_refresh 10: 72 
slk^set 10: 71 
sops 9: 61, 64 
source code 8: 5-6; 16: 3 
Source Code Control System (SCCS) 2: 77; 
3: 34-36; 13: 18-19; 14: 1-41 

command conventions 14: 10 

commands 14: 11-12 

comments 14: 27 

file retrieval 14: 15-16 

files 14: 38 

formatting 14: 39 

makefiles 13: 20 
source file 8: 51, 56; 15: 6-7 

shared library 8: 51-55 
special symbol 11: 18-20, 26 
specification file 8: 30, 57-58 
specifications 

lex 5: 1 

yacc 6: 34 
specifier 15: 4; 17: 25 

format 15: 4 

type 17: 24 
sprintf 2: 39; 4: 26 
srcdir 2: 67 
standend 10: 40, 43 
standout 10: 40, 43 
start_color 10: 12, 46-47, 50 
statement 16: 5, 12; 17: 62 

conditional 17: 38 
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static data 8: 26 
static declaration 17: 23 
stderr 2: 33-34, 42, 44 
stdin 2: 33-34, 42 
stdio 10; 1, 20, 31 
<stdio.h> 10: 13 
stdio.h 2: 28, 30, 33-34 
stdout 2: 33-34, 42, 55; 10: 3 
stdscr 10: 3, 16, 18, 20-21, 23-24, 26, 28, 39, 

60, 71, 93, 109 
storage class 11: 25-29, 31-32; 17: 6, 35-36, 
43-44 
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storage class specifiers 17: 23 
strcmp 8: 14 
stream 2: 28 
streams 2: 33 
string literal 17: 5, 13 
string operations 2: 20-21 
string table 11: 42-43 
strip 2: 66; 11: 2 
structure declaration 17: 28, 30 
structure tag 17: 29-30 
sub 4: 24 

subroutine 2: 7, 17, 24, 32-33, 35, 38 

subscript 17: 53 

substr 4: 27 

subwin 10: 66-67 

suffixes 13: 12-13 

superuser 7: 4 

switch statement 17: 39-40 

symbol 8: 32; 11: 22 

data 8: 28 

external 8: 28, 30 

imported 8: 32, 38 

local 11: 20 

special 11: 18-19 

static library 8: 29 
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symbol table 11: 16-19, 22-23, 29, 31, 33-35 
symbol value field 11: 26 
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system 2: 38-39, 41; 4: 49 
system call 2: 17, 24, 32-33, 35-36; 9: 2, 7, 
9-12, 16, 34 

msgctl 9: 18-19, 22 
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msgrcv 9: 28 

semctl 9: 55, 60 

semop 9: 64, 67 

shmctl 9: 84, 88 

shmget 9: 74, 79-80 

shmop 9: 92, 95 

smgget 9: 11 
systems programmers 1: 8 
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table 

auxiliary symbol 11: 35 

branch 8: 11 

entry 11: 38 

string 11: 42-43 

symbol 11: 23 
tag, structure 17: 29-30 
tag name 11: 36-37 
target file 13: 2 

target library 8: 10-11, 18, 22, 32, 43, 47, 50 
target machine 11: 3-4; 12: 1,7 
target shared library 8: 23 
TERM 10: 7, 9, 90 
termcap 10: 9, 83, 91-92 
<term.h> 10: 76 
termhl 10:77,80 
terminal capabilities 10: 82 
basic 10: 85 
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TERMINFO 10: 9, 89-90 
terminfo 3: 19; 10: 1-3, 5-10, 13, 75-77, 79, 

81, 83, 92 
terminfo database 10: 1, 46 
<termio.h> 10: 13 
.test 12: 13-14 
.text 12: 16-17, 26-27, 31 
tic 3: 20; 10: 5, 8-9, 89-90 
tilde 13: 18-19 
token replacement 17: 47 
tools 1: 4 

lex 5: 1 

project control, make 3: 33 

project control, SCCS 3: 34 

prototyping 1: 5 

yacc 6: 1 
touchwin 10: 109 
tparm 10: 88 
tput 10: 8, 10, 80, 90-91 
tputs 10: 76, 79 
tree structure 14: 8 
trunk delta 14: 16 
two program 10: 106 
type 17: 6 

derived 11: 31 
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type entry 11: 29, 31-32 
type name 17: 34 
type specifier 17: 24 
typedef 11: 40; 17: 30, 36 
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underline char 10: 80 
union 6: 40 

union declaration 17: 28 
unlock 7: 14 



yacc 2: 7, 51; 3: 5-8, 10-12; 5: 8, 12, 14, 16, 
19-20; 6: 1-15, 17-42, 44-45, 48-50, 57; 
16: 6 
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unput 5: 15 

unsigned 16: 8; 17: 7, 16, 29 
unsigned integer 17: 7, 10 



actions, reduce 6: 13 
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error handling 6: 28 
specifications 6: 4, 34-37 
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YYERROR 6: 38, 49, 57 
yyparse 3: 10; 6: 32, 38 
yytext 5: 6, 8-11, 15 



val 14: 36 

variable 15: 3; 16: 4-5, 12 

environment 13: 23 
vc 14: 37 

version control 17: 50 
vi 2: 37 

video attributes 10: 44 

virtual address 11: 3, 14, 27; 12: 7 

void 17: 11 
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wc 2: 42 
what 14: 34 
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WINDOW 10: 16 
window program 10: 109 
windows 10: 11, 18-19, 60 
wnoutrefresh 10: 12, 61-62, 71 
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write lock 7: 2, 4, 10 
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GENERAL 



Owner/Operator Manual (305-665) 

This illustrated document explains how 
to set up your 3B2 Computer It 
describes the 3B2"s standard and 
optional hardware and software, 
introduces some basic computer 
concepts, and explains how to do tasks, 
such as assigning logins and passwords 
with the setup command. The 
document also describes how to 
implement security measures, install 
utilities packages, and use the System 
Administration menus. 

User's Guide (307-231); ■ 3.2 Update 
(305-660) 

This document presents an overview of 
the UNIX operating system; it includes 
tutorials on the line editor (ed), the 
screen editor (vl), shell programming, 
electronic mail, sending and receiving 
files, and networking. 

User's and System Administrator's 
Reference Manual (305-646); ■ 3.2 
Updates (305-649), (305-648) 
This document describes each of the 
UNIX system user and administrator 
commands. Each description includes a 
synopsis of the command's syntax and 
an explanation of how it is used; also 
supplied, where appropriate, are 
diagnostic indications, warnings, 
examples of use, and where to find 
related information. 



User Interface Utilities Release 1.1 
Release Notes (305-653) 

This document contains release notes 
for release 1.1 of the Form and ^ 
Language Menu Interpreter (FMLI) and 
Framed Access Command Environment 
(FACE) user interface. These products 
are supported on the AT&T 3B2 series 
of computers running UNIX System V 
Release 2.0 or later. 

Framed Access Command 
Environment (FACE) User's Guide 
(305-651) 

This document describes how to use the 
FACE interface to the UNIX system, 
which provides an electronic "office" 
from which you can easily select 
commands that complete many 
conventional office tasks, such as 
organizing your file cabinet, collaborating 
on projects, and working on several 
tasks at once. The document also 
explains how to access spreadsheet, 
word processing, business graphics, or 
UNIX system applications, by selecting 
them from the FACE Services menu (if 
they have been added to the FACE 
system), or through the UNIX system 
shell. 



SYSTEM ADMINISTRATION 

System Administrator's Guide 
(305-645); ■ 3.2 Update (305-650); 
■ RFS 1.2 Update (305-656) 

This document explains how to perform 
administrative tasks. Tasks are 
organized by major subject areas; the 
processor, the file system, user services, 
disk, tape, printer, tty management, and 
networking, among others. Appendices 
cover device names and disk 
partitioning, directories and files used by 
the administrator, and error messages. 

System Performance Analysis Utilities 
(SPAU) Guide (305-607) 

This document describes commands for 



collecting and examining system usage 
data. It explains how this data can be 
used to analyze the present performance 
of the computer and to determine load 
balancing and system-tuning strategies 
that will improve performance. 

2K File System Utilities Release 1.0 J 
Release Notes (305-657) ^ 

This document describes the 2K file 
system, a new file system type for 3B2 
Computers running UNIX System V 
Release 3.2. The document shows how 
a 2K file system can improve system 
performance for some applications 
performing large data transfers. 



PROGRAMMING 



Programmer's Guide (308-139); 
■ 3.2 Update (305-662) 

This document is divided into two parts. 
Part 1 discusses the UNIX system 
programming environment and utilities 
{e.g., compilers, debuggers). Part 2 
provides details of the C language, the 
Common Object File Format, and the 
link editor. It also includes tutorials on: 
shared libraries, curses/terminfo, File 
and Record Locking, Inter-Process 
Communication facilities, awk, lex, yacc, 
lint, sees, sdb. and make. 

Programmer's Reference Manual 
(307-013); ■ 3.2 Update (305-663) 

This document provides manual pages 
describing the programming features of 
UNIX System V: commands, system 
calls, subroutines, libraries, file formats, 
macro packages, and character set 
tables. Syntax and examples are given 
as appropriate, 

C Programming Language Utilities 
Issue 4.2 Release Notes (308-198) 

This document contains installation and 
other information specifically relating to 
GPLU Issue 4.2 on a 3B2 Computer 

Advanced Programming Utilities 
issue 1.1 Release Notes (307-008) 

This document provides installation and 
other specific information about APU 
Issue 1.1 on the 3B2 Computer. APU 
contains lint, lex, yacc, sdb, make, 



sees, commands to handle shared 
libraries, and more. 

Form and Menu Language Interpreter 
(FML!) Programmer's Guide (305-652) 

This document describes a high-level 
"shell-like" language for defining forms, 
menus, and other types of frames, as 
well as screen labeled keys, a message 
line, a command line, and a banner. 
The Interpreter handles the details of 
frame creation, placement, navigation 
between frames, and processing the use 
of forms and menus. 

Extended Terminal Interface 
1,0 Release Notes (305-664) 

This document explains how to install 
the Extended Terminal Interface (ETI) 
software on a 3B2 Computer and 
presents usage notes. 

Extended Terminal Interface 
Programmer's Guide (305-658) 

This document shows how to write C 
language programs that call the 
Extended Terminal Interface (ETI) 
routines to create display and change 
forms, menus, and panels (windows with 
depth relationships). Also included is a 
description of how the ETI routines 
interact with curses routines and the 
terminfo database. It contains manual 
pages for the high-level ETI function 
libraries, the low-level curses(3X) 
library, and the tam(3X) transition library. 



PERIPHERALS 

Small Computer System Interface 
Installation Manual (305-011) 

This document describes how to add a 
Small Computer System Interface (SCSI) 
Bus to a 3B2 Computer and how to add 
external SCSI peripherals to the bus. 
Illustrations support the installation 
procedures. 

Small Computer System Interface 
Operations Manual (305-012) 

This document explains how to set up 



and use the Small Computer System 
Interface (SCSI) on the 3B2 Computer. 
It describes the hardware and software 
that are available with the different SCSI 
packages. In addition, operation 
information is given for associated 
peripherals. Special instructions tell how 
to back up and restore file systems 
using the different SCSI devices. 
Appendices cover topics such as default 
partitioning and error messages. 



NETWORKING 



Networking Support Utilities Release 
1.2 Release Notes (305-654) 

This document explains how to install 
the Networking Support Utilities, which 
supplement the Essential Utilities by 
extending system capabilities to support 
networking applications. This product is 
required if your system has Remote File 
Sharing, STREAMS mechanisms and 
tools, the Transport Jntertace, Media- 
Independent uucp, and the Listener. 

Network Programmer's Guide (307-230) 

This document introduces the AT&T 
Transport Interface, its capabilities, and 
its applications. It covers: the goals of 
the transport interface, with discussions 
of OSl, transport protocols, and 
STREAMS; the transport interface 
routines in the Network Services Library; 
and the key areas of the development of 
applications that intertace to transport 
protocols (with illustrated examples). 

Remote File Sharing Utilities Release 
1.2 Release Notes (305-655) 
This document explains how to install 
the Remote File Sharing Utilities 
package, which provides the facilities 
needed to share remote files 
transparently among computers. It 
requires the Networking Support Utilities 
and a Transport Provider (such as the 
STARLAN NETWORK). 

STARLAN NETWORK Introduction 
(989-100) 

This document describes the STARLAN 
NETWORK, a baseband, 1 megabit per 
second, local area network that provides 
simple, reliable connections (logical and 
physical) between two or more devices 



on the network. The document explains 
how the STARLAN NETWORK can 
solve information-sharing problems, it 
lists a broad range of computers and 
terminals that can be linked to the 
STARLAN NETWORK, and explains 
how to share data among them while still 
controlling access to files. A case study 
introduces the hardware and software, 
and shows how to expand the network 
using Network Extension Units, Room 
Stars, and Closet Stars. 

STREAMS Primer (307-229) 

This document presents a high-level 
technical overview of STREAMS. It 
includes a summary of the STREAMS 
mechanism, a description of the 
applications and benefits of STREAMS, 
illustrations and definitions of STREAMS 
terminology, a simple example 
(discussed from both the applications 
and systems programmers' points of 
view), a discussion of the facilities 
provided by STREAMS, and a 
comparison of certain features of 
character input/output device drivers to 
STREAMS modules and drivers. 

STREAMS Programmer's Guide 
(307-227); ■ 3.2 Update (305-661) 

This document describes the user-level 
STREAMS facilities available to 
applications programmers, how to use 
STREAMS facilities to write UNIX 
system kernel modules and device 
drivers, and it includes a summary of 
kernel-level data structures, STREAMS 
message types, and specifications of 
kernel utility routines. 
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